专利摘要:
enzymes modified with intein, their production and industrial applications. a method of predicting an intein insertion site in a protein is provided, which will lead to a switching phenotype. the method includes identifying a plurality of c / t / s sites within the protein; the selection, from the plurality of c / t / s sites, of those that are rated at 0.75 or higher by a support vector machine, within ten angstroms of the active protein site, and at or near one loop-ss-leaf junction or a loop-to-helix junction. a methods of controlling protein activity and hosts including proteins with controlled activity are also provided. in addition, intein modified proteins and plants containing intein modified proteins are provided.
公开号:BR112012010744B1
申请号:R112012010744-5
申请日:2010-11-05
公开日:2020-09-29
发明作者:R. Michael Raab;Binzhang Shen;Gabor Lazar;Humberto de la Vega;James Apgar;Phillip Lessard
申请人:Agrivida, Inc;
IPC主号:
专利说明:

This invention was made at least in part with government support under the United States Department of Energy Advanced Research Projects Agency - Energy grant number (ARPA-e) DE-AR0000042. The government has certain rights to this invention.
This application is a continuation application in part of US Patent Application No. 12 / 590,444, which was filed on November 6, 2009, and is incorporated by reference as if it were filed completely.
The list of strings, deposited electronically with this request, entitled “List of Strings”, which was created on November 5, 2010 and which has a size of 14,792,733 bytes, is incorporated by reference as if it were completely presented. FIELD OF THE INVENTION
The invention concerns the control of protein activity. BACKGROUND
Many proteins have useful characteristics, but in certain situations a protein can be difficult to use. For example, hydrolytic enzymes have important industrial and agricultural applications, but their expression and production may be associated with unwanted phenotypic effects in some expression hosts. Enzymes that degrade the cell wall, which include cellulases, xylanases, ligninases, stearases, peroxidases, and other hydrolytic enzymes, are often associated with detrimental effects on growth, physiological performance and agronomic performance, when expressed in plants. Xylanases are enzymes that catalyze the hydrolysis of beta-1,4-xylan, a linear polysaccharide component of hemicellulose contained in plant cell walls. Cellulases are enzymes that catalyze the hydrolysis, either internal or external, of glucose polymers linked by beta-1,4-D-glycosidic bonds contained in cellulose, cellulose strains with different degrees of polymerization and cellobiosis. Based on these activities, the expression of a xylanase or cellulase in a plant can lead to undesirable degradation of plant components. Some enzymes can also be poorly expressed in microbial hosts, due to their hydrolytic activity. SUMMARY
In one aspect, the invention relates to an isolated protein having an amino acid sequence with at least 90% identity to a sequence selected from the group consisting of SEQ ID NOS: 2373 - 2686 and 3315 - 3322.
In one aspect, the invention relates to an isolated nucleic acid having a nucleotide sequence, which encodes an amino acid sequence with at least 90% identity to a sequence selected from the group consisting of SEQ ID NOS: 2373 - 2686 and 3315 - 3322.
In one aspect, the invention relates to a transgenic plant including an isolated nucleic acid having a nucleotide sequence with at least 90% identity to a sequence selected from the group consisting of SEQ ID NOS: 2373 - 2686 and 3315 - 3322.
In one aspect, the invention relates to an isolated nucleic acid 10 having a nucleotide sequence, which hybridizes under conditions of moderate stringency to a sequence selected from the group consisting of SEQ ID £ NOS: 2687 - 3000 and 3323 - 3330 .
In one aspect, the invention relates to a transgenic plant including an isolated nucleic acid having a sequence of 15 nucleotides, which hybridizes under conditions of moderate stringency with a sequence selected from the group consisting of SEQ ID NOS: 2687 - 3000 and 3323 - 3330.
In one aspect, the invention relates to an isolated amino acid sequence comprising a contiguous amino acid sequence having at least 90% identity with 6.10 to 50, 10 to 100, 10 to 150, 10 to 20 300, 10 to 400 , 10 to 500 or 10 to all contiguous amino acid residues of a protein having the sequence of any of SEQ IDNOS: 2373 - 2686 and 3315 - 3322. The protein has an integin sequence, an enzyme sequence, an integin junction -extein upstream and an integin-extein junction downstream. The protein having the sequence of one of SEQ ID NOS: 3315 - 3322 exhibits at least 25 amino acid changes from SEQ ID NO: 2518. The isolated amino acid sequence includes at least one of: the joining of intein-extein to upstream, the integration of intein-extein downstream or one or more of at least one amino acid change from SEQ ID NO: 2518.
In one aspect, the invention relates to an antibody that recognizes an epitope on an isolated amino acid sequence comprising a contiguous amino acid sequence showing at least 90% identity to 6, 10 to 50, 10 to 100, 10 to 150, 10 to 300, 10 to 400, 10 to 500, or 10 to all contiguous amino acid residues of a protein having the sequence of any of SEQ ID NOS: 2373 - 2686 and 3315 - 3322. The protein has a 35 intein sequence, an enzyme sequence, an upstream intein-extein junction and an downstream intein-extein junction. The protein having the sequence of one of SEQ ID NOS: 3315 - 3322 shows at least one amino acid change from SEQ ID NO: 2518. The isolated amino acid sequence includes at least one of: an upstream intein-extein junction , an intein-extein junction downstream or one or more of at least one amino acid change from SEQ ID NO: 2518.
In one aspect, the invention relates to an isolated nucleic acid 5 having a sequence, which encodes a contiguous amino acid sequence having at least 90% identity to 6, 10 to 50, 10 to 100, 10 to 150, 10 to 300, 10 to 400, 10 to 500 or 10 to all contiguous amino acid residues of a protein having the sequence of one of SEQ ID NOS: 2373 - 2686 and 3315 - 3322. The protein has a sequence of intein, a sequence 10 enzyme, an upstream intein-extein junction and an downstream intein-extein junction. The protein having the sequence of one of SEQ ID NOS: 3315 - 3322 exhibits at least one amino acid change from SEQ ID NO: 2518. The isolated amino acid sequence includes at least one of: the upstream intein-extein junction , the junction of intein-extein downstream or one or more of at least one amino acid change with respect to SEQ ID NO: 2518. BRIEF DESCRIPTION OF THE DRAWINGS
The following detailed description of the preferred embodiments will be better understood when read in conjunction with the attached drawings. For the purpose of illustrating the invention, modalities which are currently preferred are shown in the drawings. However, it is understood that the invention is not limited to the precise provisions and instrumentalities shown. In the drawings:
FIG. 1 illustrates the distance from the intein insertion site from an active protein site. Diamonds indicate insertion sites and squares indicate other C / S / T sites, where no intein is inserted.
FIG. 2A illustrates a plant expression vector, which is designated pAG2005 (SEQ ID NO: 1).
FIG. 2B illustrates pAG2005 (SEQ ID NO: 1) in greater detail.
FIGS. 3A to 3L illustrate western blot data for 30 P77853 modified with Tth intein, in which the intein is inserted either into serine 158 (S 158) or threonine 134 (T 134) of the P77853 enzyme. In some of FIGS. 3A to 3L, parts of the western blots are covered to focus on a specific set of tracks. The agar plate phenotype is denoted for each sample at the top of the track. The agar plate phenotypes are given as “SW” for a switching phenotype, TSP for a temperature sensitive switching phenotype and P for a permissive phenotype. In each of FIGS. 3A to 3L, NIC indicates an N-extein, intein and C-extein containing protein modified with intein, and NC indicates spliced protein containing the N- and C-exteins.
FIG. 3A illustrates a western blot showing the protein P77853-Tth-S158-2 (SEQ ID NO: 1672), which has previously been heat treated at 37 ° C (panel 2, left lane) or 55 ° C (panel 2, lane 2) right) for four hours. Also shown are tracks containing protein from empty vector control (VCT) and wild type P77853 (P77) protein, which has previously been heat treated in the same way.
FIG. 3B illustrates a western blot showing the protein P77853-Tth-S 158-4 (SEQ ID NO: 1673), which has previously been heat treated at 37 ° C (panel 4, left lane) or 55 ° C (panel 4, lane 4) right) for four 10 hours. Also shown are tracks containing protein from the empty vector control (VCT) and wild type P77853 protein (P77), which has previously been heat treated in the same way.
FIG. 3C illustrates a western blot showing the protein P77853-Tth-S158-7 (SEQ ID NO: 1674), which has previously been heat treated at 15 37 ° C (panel 7, left lane) or 55 ° C (panel 7, lane 1) half) for four hours, and 70 ° C for one hour (panel 7, right lane). Also shown are tracks containing protein from the empty vector control (VCT) and wild type P77853 (P77) protein.
FIG. 3D illustrates a western blot showing the protein 20 P77853-Tth-S158-19 (SEQ ID NO: 1675), which was previously heat treated at 37 ° C (panel 19, left lane) or 55 ° C (panel 19, lane 19) half) for four hours, and 70 ° C for one hour (panel 19, right lane). Also shown are tracks containing protein from the empty vector control (VCT) and wild type P77853 (P77) protein.
FIG. 3E illustrates a western blot showing the protein P77853-Tth-S 158-20 (SEQ ID NO: 1676), which was previously heat treated at 37 ° C (panel 20, left lane) or 55 ° C (panel 20, lane 20) half) for four hours, and 70 ° C for one hour (panel 20, right lane). Also shown are tracks containing protein from the empty vector control (VCT) and wild type P77853 30 protein (P77).
FIG. 3F illustrates a western blot showing the protein P77853-Tth-S158-21 (SEQ ID NO: 1677), which was previously heat treated at 37 ° C (panel 21, left lane) or 70 ° C (panel 21, right lane ) for one hour. Also shown are tracks containing protein from the empty vector control 35 (VCT) and the wild type P77853 protein (P77), which has previously been heat treated in the same way.
FIG. 3G illustrates a western blot showing the protein P77853-Tth-S158-25 (SEQ ID NO: 1678), which was previously heat treated at 37 ° C (panel 25, left lane) or 70 ° C (panel 25, right lane ) for one hour. Also shown are tracks containing protein from the empty vector control (VCT) and wild type P77853 (P77) protein, which has been previously heat treated in the same way.
FIG. 3H illustrates a western blot showing the protein P77853-Tth-S 158-38 (SEQ ID NO: 1679), which has previously been heat treated at 37 ° C (panel 38, left lane) or 55 ° C (panel 38, right) for four hours. Also shown are tracks containing protein from the empty vector control (VCT) and wild type P77853 (P77) protein, which has been previously heat treated in the same way.
FIG. 3I illustrates a western blot showing the protein P77853-Tth-S158-39 (SEQ ID NO: 1680), which was previously heat treated at 37 ° C (panel 39, left lane) or 55 ° C (panel 39, middle lane ) for four hours, and 70 ° C for one hour (panel 39, right lane). Also shown are tracks containing protein from the empty vector control (VCT) and wild type P77853 (P77) protein.
FIG. 3J illustrates a western blot showing the protein P77853-Tth-S158-42 (SEQ ID NO: 1681), which was previously heat treated at 37 ° C (panel 42, left lane) or 55 ° C (panel 42, middle lane) for four hours, and 70 ° C for one hour (panel 42, right lane). Also shown are tracks containing protein from the empty vector control (VCT) and wild type P77853 protein.
FIG. 3K illustrates a western blot showing the protein P77853-Tth-S158-138 (SEQ ID NO: 1691), which was previously heat treated at 37 ° C (left lane) or 59 ° C (second from left lane) during four o'clock. Also shown are tracks containing protein from the empty vector control (VCT) and wild type P77853 protein (P77853).
FIG. 3L illustrates a western blot showing protein P77853-Tth-T134-1 (SEQ ID NO: 1629) (panel 1), protein P77853-Tth-T134-2 (SEQ ID NO: 1630) (panel 2), protein P77853 -Tth-T134-3 (SEQ ID NO: 1631) (panel 3), the P77853-Tth-T134-9 protein (SEQ ID NO: 1632) (panel 9), the P77853-Tth-T134- 91 protein (SEQ ID NO: 1644) (panel 91), the P77853-Tth-T134-48 protein (SEQ ID NO: 1638) (panel 48), the P77853-Tth-T134-80 protein (SEQ ID NO: 1640) (panel 80 ) and the protein P77853-Tth-T134-95 (SEQ ID NO: 1645) (panel 95), which were previously heat treated at 37 ° C (left lane in each of the panels mentioned above) and 70 ° C (lane right on each of the panels mentioned above) for one hour. Also shown are tracks containing protein from the empty vector control (VCT) and wild type P77853 protein (P77), which has previously been heat treated in the same way. The phenotype of each protein is given above its corresponding clues.
FIGS. 4A to 4C illustrate western blot analyzes for P77853 xylanase mutants modified with S158 Tth intein.
FIG. 4A illustrates a western blot analysis for P77853 xylanase modified with S158-19 Tth intein (SEQ ID NO: 1675). Protein samples were incubated at 59 ° C for different amounts of time (0, 1, 2, 3, 4 and 6 hours). The empty vector (V) and wild type P77853 control samples are shown on the far right in conjunction with a molecular weight ladder. The middle gray area 10 is for covering tracks that contained other samples.
FIG. 4B illustrates a western blot analysis for AP77853 xylanase modified with S158-30-103 Tth intein (SEQ ID NO: 1701). The protein samples were incubated at each temperature of 37 ° C, 50 ° C, 59 ° C and 65 ° C, for different amounts of time (1, 2.3, 4 and 6 hours) as indicated. The 15 empty vector samples (Vect) and wild type P77853 control samples are shown on the far right in conjunction with a molecular weight ladder.
FIG. 4C illustrates a western blot analysis for P77853 xylanase modified with T134-100-101 Tth integin (SEQ ID NO: 1711). Protein samples were incubated at each temperature of 37 ° C, 50 ° C, 59 ° C and 65 ° C, 20 for different amounts of time (1, 2, 4, 6 and 17 hours) as indicated. The empty vector (Vect) and wild type P77853 control samples are shown on the far right in conjunction with a molecular weight ladder.
FIG. 5 illustrates plasmid vectors for expression and secretion of proteins modified with inteins; for example, endoglycanases derived £ 25 from Acidothermus cellulolyticus, in yeast cells.
FIG. 6 illustrates activity assays for Pichia strains that express either P07981 (endoglycanase EG-1 from Trichoderma reesei), P54583 or albumin (as a negative control).
FIG. 7 illustrates a plaque assay for secretion of 30 P54583 from S. cerevisiae.
FIG. 8 illustrates the activity of P54583 at different pH levels and at different temperatures.
FIG. 9 illustrates the activity of P54583 over time and at different temperatures.
FIG. 10 illustrates a PNP-C assay for P54583.
FIG. 11 illustrates the purification of P54583 with microcrystalline cellulose.
FIG. 12 illustrates western detection of wild type p54583.
FIG. 13 illustrates candidate insert sites for P54583.
FIG. 14 illustrates an assembly strategy for genes 5 that encode intein-modified endoglycanases.
FIG. 15 illustrates the behavior classification of endoglycanases modified with intein in response to treatments at different temperatures.
FIG. 16 illustrates assays of endoglycanase 10 activity modified with intein.
FIG. 17 illustrates a western blot analysis of various A5 modified P54583 proteins.
FIGS. 18A - C illustrate error-prone CPR to generate immunogenized libraries.
FIG. 19 illustrates the effect of a disabled integin on the enzyme activity in P54583.
FIG. 20 illustrates the recovery of enzyme activity by pre-incubation at various temperatures.
FIG. 21 illustrates the enzyme activity recovered from 20 of P54583 carrying a mini-intein in the S237 position after pre-incubation at different temperatures.
FIG. 22 illustrates the pre-incubation time and the activation of endoglycanase modified with intein. Each panel (1, 2, 3 and 4) includes bars representing incubation of 0, 2, 4, 6, 8 and 10 hours presented consecutively from 9 25 left to right.
FIG. 23 illustrates results of high-capacity endoglycanase assays for an intin-modified endoglycanase library.
FIG. 24 illustrates a selection of endoglycanase library modified with mutagenized intein.
FIG. 25 illustrates repeated activity assays in candidates from an endoglycanase library modified with mutagenized intein.
FIG. 26 illustrates heat-inducible enzyme activity from endoglycanases modified with intein carrying mutations at position R51 of the 35 integin Tth.
FIG. 27 illustrates a phylogenetic tree of endoglycanases.
FIG. 28 illustrates a plasmid vector for expression and secretion of proteins modified with intein; for example, expression and secretion of an endoglycanase derived from termite in yeast.
FIG. 29 illustrates yeast that expresses an empty expression vector, an expression vector that encodes NtEG and an expression vector that encodes a mutant NtEG lacking the native signal peptide.
FIG. 30 illustrates endoglycanase activity of NtEG and a mutant NtEG lacking the signal peptide over a temperature range.
FIG. 31 illustrates endoglycanase activity of a mutant NtEG lacking the native signal peptide and P54583 over a pH range.
FIG. 32 illustrates endoglycanase activity of a mutant NtEG lacking the native signal peptide and with or without a His marker.
FIG. 33 illustrates a strategy for assembling A genes encoding intein-modified NtEG endoglycanases.
FIG. 34 illustrates a time course of enzyme activity from yeast cells that express endoglycanases of termites modified with intein.
FIG. 35 illustrates the expression cassette in vector X ZAP®II.
FIG. 36A - D illustrate switching assays at pH 6.5 for P77853 modified with intein in Example 15, for the T134 and S158 insertion sites. The set of internines was inserted in position S158 (FIGS. 36A - B) and in position T134 (FIGS. 36C - D) of P77853. High and low temperature activities are plotted against wild type P77853 (FIGS. 36A and C). High temperature activities versus fold induction (fo / d induction) (high temperature activity / low temperature activity) are also plotted (FIGS. 36B and D). The internines are degraded by their host's thermophilicity. The vertical dashed line represents 10% of low-temperature wild-type activity. The horizontal dashed line is 40% of high temperature wild type activity.
FIGS. 37 A to D illustrate switching assays at pH 7.5 for P77853 modified with intein in Example 15, for insertion sites T134 and S158. The set of internines was inserted in position S158 (FIGS. 37A - B) and in position T134 (FIGS. 37C - D) of P77853. High and low temperature activities are plotted against wild type P77853 (FIGS. 37A and C). High temperature activities versus fold induction (high temperature activity / low temperature activity) are also plotted (FIGS. 37B and D). Internines are degraded by their host's thermophilicity. The vertical dashed line represents 10% of low-temperature wild-type activity. The horizontal dashed line is 40% of high temperature wild type activity.
FIGS. 38A - D illustrate candidates with top activity in Example 15. The set of internines was inserted at position S158 (FIGS. 38A and C) and at position T134 (FIGS. 39B and D) of P77853. The activities following heat treatments of high temperature (bar on the right for each sample) and low temperature (bar on the left for each sample) at pH 6.5 (FIGS. 38A and B) and at pH 7.5 (FIGS. 38C and D) are plotted on the graph for the first 20 candidates for the highest activities compared to the wild type and the empty vector. The dashed line between 2 and 4 on the Activity axis represents 40% of high temperature wild type activity. The dashed line below 2 represents 10% of low temperature wild type activity.
FIGS. 39A - D illustrate examples of different switching classes from Example 15. FIGS. 39A and C illustrate data for S158 P77853 intein inserts, and FIGS. 39B and D illustrate DNA for T134 P77853 intein inserts. FIGS. 39A and B correspond to heat treatments at pH 6.5. FIGS. 39C and D correspond to heat treatments at pH 7.5. The dashed line between 2 and 4 on the Activity axis represents 40% of high temperature wild type activity. The dashed line below 2 represents 10% of low-temperature wild-type activity.
FIG. 40 illustrates the reassessment of top 20 candidates (AS-146, AS-2, AS-79, AS-83) in Example 15 and their comparison with a low-performing candidate (AS-8), a control positive (P77853) and an empty vector control (pBS). The dashed line above 1 on the Activity axis represents 40% of high temperature wild type activity. The dashed line below 0.5 represents 10% of low temperature wild type activity.
FIG. 41 illustrates a western blot of top performing candidates at the S158 insertion site (AS-2, AS-79, AS-83 and AS-146) and at the T134 insertion site (AT-2, AT-83, AT-149 , AT-154) of P77853. pBS is the empty vector control, P77 is the positive control (P77853). The left and right bars on each sample designation represent the low temperature (37 ° C / 4 hours) and heated (60 ° C / 4 hours) aliquots from the same lysate, respectively. The arrows indicate the P77853 precursors modified with intein, NC marks the position of the mature protein.
FIGS. 42A and B illustrate differences in activity and switching based on thermo-tolerance. The fraction of candidates showing high activity at high temperatures (FIG. 42A) and commutation higher than 2 X (FIG. 42B) is compared for internines from thermophilic / hyperthermophilic organisms (bar on the right for each sample tag) in relation to internines from mesophilic organisms / UNK (left bar for each sample label).
FIGS. 43A and B illustrate differences in activity and switching based on the length of the integin. The fraction of candidates showing high activity at high temperatures (FIG. 43A) and switching higher than 2 X (FIG. 43B) is compared for integers <240 amino acids in length (bar 5 on the left for each sample) and integers> 240 long amino acids (bar on the right for each sample).
FIGS. 44A - D illustrate similarity of sequences among the best candidates. FIGS. 44A and C illustrate those for the S158 P77853 intein inserts and FIGS, 44B and D illustrate those for T134. FIGS. 44A and 10 B illustrate those for heat treatments at pH 6.5. FIGS. 44C and D illustrate those for heat treatments at pH 7.5 for “Top Hits” (classified as> 40% by weight of activity or> 30% by weight of activity and> 2 X switching) and “None Hits” (the remaining strings). FIGS. 44 A - D show the fraction of similar strings (E value <1 x 10'20) that are also the best candidates (“Top 15 Similar Hits” in the bar on the left for each of the two sample tags) or worst candidates ( “Similar Hits” in the bar on the right for each of the two sample tags). DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Unless otherwise defined, all technical and scientific terms used herein have the meaning commonly understood by a person skilled in the art to which this invention belongs. The methods of the modalities here can be replaced or combined with other methods of selection and application known to those skilled in the art. The expression "at least one (a)" followed by a list of two or more items, such as "A, 25 B or C", means any one of A, B or C individually, as well as any combination of them.
As used here, the word "extein" refers to the portion of a protein modified with intein that is not part of the intein.
As used here, the terms “30 amino terminal extein”, “N-terminal extein” or “N-extein" are synonymous and refer to an extein that is positioned before the N-terminal residue of the internine. an amino terminal extein, N-terminal extein or N-extein is fused to the amino terminal of the intein in a modified protein with assembled intein.
As used here, the terms "carboxy terminal 35 extein", "C-terminal extein" or "C-extein" are synonymous and refer to an extein that is positioned before the C-terminal residue of the internine. The amino terminus of a carboxy terminal, C-terminal or C-extein is fused to the carboxy terminus of the intein in a modified protein with assembled intein.
As used here, the term "target protein" is a protein, into which an intein is inserted or which is a candidate for insertion of an intein. Prior to the insertion of intein, the respective portions of the target protein can be referred to as an extein, amino terminal extein or carboxy terminal extein, based on the desired insertion site.
A "target protein" can be an enzyme, and the term "target enzyme" means a "target protein" that is an enzyme.
As used here, “permissive (a)” or “P” refers to the modification with intein, where the protein modified with intein retains function after insertion of intein, or the intein is cleaved or spliced from the protein to leave extein or protein bound with function.
As used here, “non-permissive (a)” or “NP” refer to the modification with intein, where the protein modified with intein has reduced function after the insertion of intein.
As used here, the term "temperature sensitive" refers to a modification with intein, in which the protein modified with intein has greater function after exposure to temperature or temperature ranges, or the intein is spliced from the protein to leave extein or bound protein with greater function after exposure to temperature or temperature ranges.
As used here, the word "switching" refers to a change in activity of a protein modified with intein in response to a change in physical or chemical condition. A modification with intein, which results in a modified protein with a "switching" or "switcher" intein, is non-permissive before the change of condition and permissive after the change of condition. Switching can take place based on the presence of the intein, the cleavage of the intein from an extein or the cleavage of the intein and binding of the exteins.
As used here, "temperature sensitive switch splicer" or "TSP" refer to a protein modified with intein, where the intein splices in response to an induction temperature or temperature range. The protein modified with intein may be non-permissive before exposure to temperatures other than that of the induction temperature or temperature range and permissive after exposure to the temperature or induction temperature range.
“Isolated nucleic acid”, “isolated polynucleotide”, 35 “isolated oligonucleotide”, “isolated DNA” or “isolated RNA”, as used here, refer to a nucleic acid, polynucleotide, oligonucleotide, DNA or RNA separated from the organism from from which it originates and from the naturally occurring genome, location or molecules with which it is normally associated, or which was prepared through a synthetic process.
"Isolated protein", "isolated polypeptide", "isolated oligopeptide" or "isolated peptide", as used herein, refer to a protein, polypeptide, oligopeptide or peptide separate from the organism from which it originates and from location of naturally occurring, or molecules with which it is normally associated, or that was prepared through a synthetic process.
As used here, the word "variant" refers to a molecule that retains a biological activity, which is the same or substantially similar to that of the original sequence. The variant may come from the same species or from a different species or it may be a synthetic sequence based on a natural or precursor molecule.
Nucleic acids, nucleotide sequences, proteins or amino acid sequences, which are referred to here, can be isolated, purified, chemically synthesized or produced using recombinant DNA technology. 15 All of these methods are well known in the art.
As used here, the expression "operatively linked" refers to the association of two or more biomolecules or portions of one or more biomolecules in a configuration relative to each other, such that the normal function of the biomolecules can be performed. In relation to the nucleotide sequences, 20 “operably linked” refers to the association of two or more nucleic acid sequences, through enzymatic binding or otherwise, in a configuration relative to each other, such that the normal function of biomolecules can be performed. For example, the nucleotide sequence, which encodes a pre-sequence or secretory leader, is operably linked to a nucleotide sequence for a polypeptide if it is expressed as a pre-protein that participates in the secretion of the polypeptide; a promoter or enhancer is operably linked to a coding sequence if it affects the transcription of the coding sequence; and a ribosome binding site is operably linked to a coding sequence if it is positioned to facilitate translation of the sequence.
Isolated proteins with controlled activity, isolated nucleic acids that encode the isolated proteins, methods for determining intein insertion sites and methods for controlling protein activity are provided. Proteins or nucleic acids can be supplied in plants, microbes and other organisms. Through control, one or more of the proteins or nucleic acids 35 could be used in the production of fuels, fibers, pasta, chemicals, sugars, textiles, pulp, paper, human food or animal feed. Preferably, proteins or nucleic acids do not readily interfere with one or more of the growth, physiology or other performance characteristics of the expression host. The protein to be controlled can be an enzyme, but it could be any type of protein, including a non-enzyme, a structural protein or a hormone.
One way to control protein activity is with 5 integers and the control can allow expression of a protein modified with intein with a previously defined level of activity. Inteins are self-ligating and self-ligating peptides. The collective attributes of being both self-cleaving and self-ligating are referred to as “self-splicing” or “splicing.” An integin cleaves from protein and mediates the binding of protein sequences (exteins) to from 10 it cleaves to mend the protein. An intein can be inserted internally into the protein sequence or fused terminally to the protein. An insertion of intein into a protein can allow control of a protein by providing a protein that exhibits one activity when intein is present and another activity after cleavage or splicing of intein. In some cases, the intein seam reaction 15 can be controlled by one or more of a variety of induction conditions. When an activity that is normally harmful to the host is reduced, intein can protect the host from expression of harmful growth, physiological or yield effects of the protein. After protein expression, activity could be modified by exposing the modified protein to reaction conditions that induce the intein seam. The protein that results after the splice may show increased activity. In one embodiment, the modification with intein is non-permissive at low temperatures and permissive at higher temperatures, such that the protein modified with intein is switched when the temperature is changed from lower to higher temperature. In some embodiments, however, enzyme 25 shows lower activity after cleavage and / or ligation. A nucleic acid that encodes the protein modified with intein can be codon optimized for expression in a plant. Target proteins, which can be modified with an intein in the present embodiments, include, but are not limited to, enzymes that degrade cell wall, enzymes that degrade lignocellulose, xylanases and cellulases. All 30 proteins described here can be a target protein for modification with intein.
The target protein can be modified with an intein selected from the Mth, Psp-Pol, mini Psp-Pol (mPsp-Pol), RecA, Tac, Tag, Tth, mini Tth, or derived derivatives. Mth, Psp-Pol, mini Psp-Pol, RecA, Tac, Tag, Tth, and mini Tth (mTth) may include the sequence of SEQ ID NOS: 2, 3, 4 - 87, 88, 35 89, 90, 91 and 92 - 103, respectively. However, an intein may come from another source or a modified form of a natural intein.
Isolated intein-modified xylanases are provided. Modalities of the xylanases modified with intein show a different activity before and after cleavage with intein or splice with intein. In one embodiment, cleavage or splice with intein is induced by exposure of the modified xylanase with intein to an induction condition. The induction condition can be, but is not limited to, temperature rise. The elevated temperature may be within, but is not limited to, the 50 - 70 ° C range, which includes temperatures of 50 ° C and 70 ° C, or the sub-ranges between any two integers within the range. The elevated temperature can be greater than or equal to a temperature in increments of whole numbers within the range of 25 - 70 ° C. The elevated temperature can be greater than or equal to 50 ° C, 55 ° C, 59.9 ° C, 60 ° C, 65 ° C or 70 ° C. A nucleic acid encoding an integin-modified xylanase 10 is preferably, but not necessarily, codon optimized in a plant. In one embodiment, an integin-modified xylanase can be expressed in a transgenic plant.
Isolated intein-modified cellulases are provided. Modes of intein-modified cellulases show different activity 15 before and after cleavage with intein or splicing with intein. In one embodiment, cleavage or splice with intein is induced by exposure of the cellulose modified with intein to an induction condition. The induction condition can be, but is not limited to, temperature rise. The elevated temperature may be within, but is not limited to, the 50 - 70 ° C range, which includes temperatures of 50 ° C and 70 ° C, or the 20 sub-ranges between any two integers within the range. The elevated temperature can be greater than or equal to a temperature in increments of whole numbers within the range of 25 - 70 ° C. The elevated temperature can be greater than or equal to 45 ° C, 50 ° C, 55 ° C, 60 ° C, 62 ° C or 65 ° C. A nucleic acid encoding an intein modified cellulase is preferably, but not necessarily, codon optimized 25 in a plant. In one embodiment, cellulose modified with intein can be expressed in a transgenic plant.
Xylanases, which may be target proteins, include, but are not limited to, beta-1,4-xylanase 229B from Dictyoglomus thermophilum, (accession number P77853, SEQ ID NO: 104), endo-1,4-beta -xylanase from 30 Clostridium thermocellum (accession number P51584, SEQ ID NO: 105), an alkaline thermostable endoxylanase precursor from Bacillus sp. NG-27 (accession number 030700, SEQ ID NO: 106), endo-1,4-beta-xylanase from Thermomyces lanuginosus (accession number 043097, SEQ ID NO: 107), and a thermally stable keloxylanase from Clostridium stercorarium (accession number P40942, SEQ ID NO: 108). 35 Xylanases can be modified with one or more of several integers, including, but not limited to, at least one selected from the Mth, Psp-Pol, mini Psp-Pol, RecA, Tac, Tag, Tth, mini Tth, or derivatives thereof. In one embodiment, the Mth, Psp-Pol, mini Psp-Pol, RecA, Tac, Tag, Tth or mini Tth integers have the sequence of SEQ ID NOS: 2, 3, 4 - 87, 88, 89, 90, 91 or 92 - 103, respectively. A single integer or multiple integers can be inserted into one or more of the multiple candidate sites in xylanases.
Cellulases, which may be target proteins, but are not limited to, Clostridium thermocellum celK cellulase (accession number 068438 (SEQ ID NO: 109)), Thermomonospora fusca celB cellulase (accession number P26222 (SEQ ID NO: 110)), Ace1 Endoglycanase E1 from Acidothermus cellulolyticus (accession number P54583 (SEQ ID NO: 111)), and Nasutitermes takasagoensis NtEG cellulase (accession number 077044 (SEQ ID NO: 112)). Cellulases can be modified with one or more of several integers, including, but not limited to, at least one selected from the Mth, Psp-Pol, mini Psp-Pol, RecA, Tac, Tag, Tth, mini Tth, or derivatives thereof. In one embodiment, the Mth, Psp-Pol, mini Psp-Pol, RecA, Tac, Tag, Tth or mini Tth integers have the sequence of SEQ ID NOS: 2, 3, 4 - 87, 88, 89, 90, 91 or 92 - 103, respectively. A single integin or multiple integers can be inserted into one or more of multiple candidate sites in cellulases.
A protein modified with intein can be produced by standard molecular biology techniques and then selected. The intein, the target protein or the protein modified with the intein can be subjected to the mutation and then 20 selected. Selection systems, which can be used, include lambda phage, yeast or other expression systems, which allow the production of the protein and / or the testing of its physical and / or functional characteristics. From a protein modified with intein or from a population of protein modified with mutant intein, candidates can be isolated and further analyzed. Further analysis may include DNA sequencing, functional assays, enzyme activity assays and monitoring of changes in activity, structure or splicing in response to induction conditions.
Induction conditions may include exposure of the protein modified with intein to changes in physical or chemical conditions, such as, but not limited to, changes in temperature, pH, concentration of splicing inhibitors, concentration of ligand, light, salinity conditions and pressure. Natural or mutant proteins can be selected to determine induction conditions. In addition, integins can be derived from organisms adapted to life in a desired induction condition. For example, temperature-induced integins can be isolated from psychrophils, mesophiles or thermophiles (for example, Nanoarchaeum equitans, Pyrococcus abyssi or Pyrococcus sp.); pH-induced integines can be isolated from acidophils, alkalophils or neutrophils (for example, Pyrococcus sp., Mycobacterium tuberculosis, Saccharomyces cerevisiae); and saline-induced integins can be isolated from halophils. Chemically induced or inhibited proteins were also identified. As non-limiting examples of chemically induced or inhibited integins, the vacuolar ATPase (VMA) subunit integin from Saccharomyces cerevisiae cleaves inducibly by exposure to DTT, NH2OH, or cysteine; and integins isolated from Mycobacterium and others from Saccharomyces have been shown to present spliced inhibition in the presence of Zn2 +. The induction of inhibited integins can occur by removing the inhibiting condition. Natural proteins can also be mutated and selected to determine whether the mutation (s) resulted (aram) in an intine 10 that is inducible in a desired induction condition. An intein from any of these sources can be supplied in an intein-modified protein.
Integin insertion sites can be determined experimentally. To determine whether an insertion site will allow for seaming with intein, the intein-protein fusion gene can be constructed and cloned using 15 methods known in the art, the protein modified with intein can be expressed, and the protein modified with intein tested regarding their ability to amend themselves spontaneously or under conditions of induction.
To avoid adding any additional amino acids to the protein, and thereby potentially alter the protein's function or activity, 20 native cysteines, serines and threonines, which occur within a protein, can be selected as potential insertion sites of intein. After insertion, the protein can be tested before and after cleavage and / or binding of intein for changes in its function.
Proteins can be inserted into a protein anywhere by adding a cysteine, serine or threonine to the new junction site. Cysteine, serine or threonine can be added by replacing an amino acid within the protein sequence or by inserting cysteine, serine or threonine. When an intein is inserted at the new junction site, the carboxy terminus of the intein will be fused to the first amino acid at the amino terminus of the carboxy extein. If an additional cysteine, serine or 30 threonine is placed in a protein to facilitate the insertion of intein, then that amino acid will be left inside the protein following the splicing reaction. Additional amino acids, left in a mature protein following the splicing reaction, can interfere with the function or activity of the protein, so one could confirm the function and activity of any protein resulting from such a splicing reaction. , which contains an additional amino acid. Functional assays are known in the art to determine the function of any known protein, which has been assigned a function.
Due to the fact that many proteins contain multiple cysteines, sennas and threonines, it may be desirable to classify the order of, or even limit, the number of insertion sites that are tested for the intein seam. Three characteristics, which can be used to predict an intein insertion site, are: A) the local sequence as described by a support vector machine (SVM), 5 B) the distance from the insertion site to the site residues active and C) the proximity of the insertion site to a local secondary structure (for example, at or near the end of an alpha-helix or beta-leaf). In one embodiment, the local sequence and distance to the active site are used to narrow the selection of the proposed insertion sites, while information about the secondary structure element 10 can be used to prioritize similar insertion sites. A) The Local String
A An SVM method can be used to predict or evaluate intein insertion sites. A suitable training set of known intein insertion sites can be assembled from native intein insertion sites. Sequences of known intein insertion sites for this purpose can be found in the NEB inbase database, as described in Perler, F. B. (2002), InBase, The lutein Database, Nuc. Acids Res. 30: 383-384, which is incorporated here, in its entirety, as if it were completely presented. Preferably, the internin insertion sites of the training set have the 20 sequences of SEQ ID NOS: 1233 - 1512. One source of protein sequence for this purpose is the NCBI database, but many other sources are available. The proteins containing intein corresponding to the sites of insertion of the intein of the training set of SEQ ID NOS: 1233 - 1512 present the sequences of SEQ ID NOS: 393 - 672, respectively. Based on the sequins of intein (SEQ ID NOS: Q 25 113 - 392) and the sequences of proteins containing intein (SEQ ID NOS: 393 - 672)), the extein sequences of each protein containing intein can be separated from each sequence of intein. The N-exteins in the protein sequences of SEQ ID NOS: 393 - 672 are shown in SEQ ID NOS: 673 - 952, respectively, and the C-exteins in the protein sequences of SEQ ID NOS: 393 - 672 are presented in SEQ NOS ID 30: 953 - 1232, respectively. For the generation of SVM sequence prediction, the cassette, which includes the insertion site X and the sequence surrounding X in the N- and C- exteins, is determined. Preferably, the analyzed sequence includes a cassette of amino acids -3 to +2 (6 amino acids in total, numbered as -3, -2, -1, 0, 1, 2) surrounding X (a sequence of NNNXNN, in which X is amino acid 0). The following description 35 applies to the NNNXNN cassette as a model for the SVM. If a cassette other than NNNXNN is used, then the SVM will be modified, as will be evident from the description here. The cassette is converted into a vector V using the following equations: V = [site.3 site.2 site.! site + site + 1 site + 2] where site, = [aajALA aa, ARG ... aaiTRP aa, TYR] aa, N = 1, if the N-type amino acid is present at site i; otherwise, N = 0. This converts the six amino acid cassette sequence into a 1 by 120 vector. The insertion site cassette for the proteins containing SEQ ID NOS: 393 - 672 is provided in SEQ ID NOS: 1233 - 1512, respectively. This set of vectors for insertion site cassettes is used as the set of 10 true positive controls for training SVM. From each protein with a true positive, three random NNNXNN cassettes with cysteine, threonine and serine (referred to here as "C / T / S") in position X (0), but without insertion of intein, are also chosen from the sequences of N- and C-exteins (preferably from SEQ ID NOS: 673 - 1232) as true negatives. The set of true negatives 15 from sequences of exteins are then compiled. A selected true negative can be from the same protein as the true positive insertion site and have the same type of residue in the X position as the true positive.
The total SVM for predicting intein insertion sites is trained on the entire set of sequins for intein insertion sites, removing any sequences that are identical. This can be done by implementing any one of a number of different methods or programs. An SVM program, which can be used to predict intein insertion sites, is SVMJight V6.02 (August 14, 2008), which is incorporated by reference, as if it were completely presented and available from Thorsten Joachims Weichgut LLC, Ithaca, NY. See also Thorsten Joachims, Making large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning, B. Scholkopf and C. Burges and A. Smola (ed.), MIT-Press, 1999, which is incorporated by reference, as if it were presented completely. Briefly, SVMJight V6.02 is an implementation of the support vector machine training method of the 1999 Joachims publication referenced above, which takes into account the difficulty of larger training sets associated with large-scale problems. The algorithm is based on a decomposition strategy, which addresses these issues with selection variables for the operational set in an efficient way. With SVMJight V6.02, a linear core and a factor 35 of cost adjusted to 1 are used, thus the errors in the positive and negative sets are equally weighted.
To test the validity of this method, smaller sets of insertion site cassettes can be chosen for training and testing by following the following method: 1) A random set of m positive training set insertion sites, with unique sequences, are selected (in a modality, m ranges from 1 to 250, and the sequences are selected from SEQ ID NOS: 1233 - 1512); 2) for each true positive insertion site, 5 corresponding random negative negative cassettes are selected at random from the exteins of the same protein containing intein (in one embodiment, SEQ ID NOS: 673 - 1232), associated with the true positive insertion site , the true negatives having the same central amino acid X, but no insertion of intein, and 3) the unique sequences remaining in the group that was not selected in step 10 1), for example, those remaining in SEQ ID NOS: 1233 - 1512, can be selected as the test set. The support vectors are then used to classify the test set, which consists of positive values from cassettes from known insertion sites, and negative values from all other cassettes from non-insertion sites from the exteins (SEQ ID NOS: 673 - 1232) with cysteine, threonine and 15 serine in position 0.
The classifications for the collection of sites for each protein are then compared and the insertion sites are ordered according to their classifications. To create a measure for comparison, each intein insertion site can be designated with a number that is calculated as the ratio of the number of sites with a lower SVM rating than the insertion site (L), divided by the number of all sites in the test set except one (Nn), or L / Nn. A measure of 1 would mean that the insertion site has a higher number than all other sites, while a measure of 0 would mean that it has a lower number than all other sites. This process can be repeated 25 times for each training set size, with each run being based on a random selection of insertion site cassettes from SEQ ID NOS: 1233 - 1512, and the corresponding true negative insertion sites. selected from the corresponding SEQ ID NOS: 673 - 1232, to be used for training and testing. Table 1 below shows the measurement for 30 known intein insertion sites using this training and testing procedure. The mean measurement for the known intein insertion sites and the standard deviations for each training set size in Table 1 are based on the preferred modality including training and test set sequences selected from SEQ ID NOS: 673 - 1512. For training sets of size 25 or higher, on average, the intein insertion site has a measurement of 0.75. This proved to be statistically significant, with an approximate p value of 10'10 for a training set of 150 cassettes from insertion sites. Potential intein insertion sites for any target protein can be selected using SVM to predict, based on local sequence characteristics, insertion sites that can be used to modify the activity of the target protein. In one embodiment, candidate insertion sites with a rating of 0.75 or higher are chosen as the place to insert an integer. Table 1

A preferred set of PGATSP, PGATVP, GAKSLG, PGATSL, PGASPL, PGATGP, AQRSLG, NQPSIV, NQASIV, PNMSSA, GNHSSG, PSHSAY, SLMSSC, TNTSNY, IDTSRN, PSTSAY, QIKSLY, FESN, , MWGTLR, LSASSY, FAQTQI, GGRSFV, SFVCGF, GFGSNP, NPPTRP, HHRSSS, HRSSSC, RSSSCP, DWNTFN, TFNSPD, DDRSDY, EVATDY, NQVTEL, SSVTFW, LRESVW, LRESVW, RF, , GHQTHI, MRNSPW, RFHTLV, DYNTDD, DKYSWL, LDMSIY, HNQTPT, DIKSWD, WGISDK, SGATDL, NNNXNN includes those featuring the sequence of GGKCGG, GGKSGG, GGKTGG, YYDY, GYFSSG, NGNSYL, YGWTRN, YDPSSG, LGKTTR, YFSSGY, IDHTDS, SWSTNE, HTDSWS, NEITIN, DSWSTN, LDQSYV, EDPTIT, SYVTGY, PWGSNS, GSNSFI, TPGSGG, LMS, DGS, SMS, DGS, DGS, ATNTSN, CDPSGR, PQGTWF, VIDTSR, QGLTSL, SGQSAL, NGDSYW, SGDTGG, GVQSYN, LVYSAH, EFGTTL, FQWTFW, TFWSWN, NPDSGD, GYQSSG, IVESWG, GTTN, , YGWSTN, YQSSGS, SNASGT or DGGTYD (SEQ ID NOS. 1513-1628, respectively). B) The Distance from the Insertion Site to the Residues of Active Sites
Although an insertion of intein at any point in a protein is contemplated, an intein insertion site can be selected to be close to the active site of the protein. As shown in FIG. 1, it was found that intein insertion sites within 25 angstroms of the active site are more common than those further away. In FIG. 1, the distance between the insertion site and the active site 5 is measured from i) the atom at the insertion site amino acid that is closest to the active site and ii) the atom at the active site that is closest to the amino acid of the insertion site. An integin can be inserted in a position that is less than or equal to 25, 24, 23, 22, 21,20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1 angstrom away from the active site. In one embodiment, an insertion site of 10 integins is located at 10 or less angles from the active site of the target protein. As used here, "within 10 angstrons" means 10 angstrons or less. The insertion site A can be separated from the active site in the primary or secondary structure of the protein and the distance is measured by physical distance, instead of the number of amino acids or secondary structure milestones. To determine the distance from the insertion site residue to the active site, protein characteristics can be obtained by reference to published data or to crystallographic, nuclear resonance or homology models. Homology models can be constructed using Swissprot (SWISS-MODEL and the Swiss-Pdb Viewer: An environment for comparative protein modeling. Guex, N. and Peitsch, MC (1997) Electrophoresis 18, 2714-2723, which 20 is here incorporated by reference, as if it were completely presented) with default parameters. Residues from active sites can be identified by reference to the literature with respect to specific protein, or by using the annotation of positions of active sites as described by the NCBI genPept files of National Center for Biotechnology Information databases. David L. Wheeler, Tanya Barrett, 0 25 Dennis A. Benson, Stephen H. Bryant, Kathi Canese, Vyacheslav Chetvernin, Deanna M. Church, Michael DiCuccio, Ron Edgar, Scott Federhen , Lewis Y. Geer, Yuri Kapustin, Oleg Khovayko, David Landsman, David J. Lipman, Thomas L. Madden, Donna R. Maglott, James Ostell, Vadim Miller, Kim D. Pruitt, Gregory D. Schuler, Edwin Sequeira, Steven T. Sherry, Karl Sirotkin, Alexandre Souvorov, Grigory Starchenko, Roman L. 30 Tatusov, Tatiana A. Tatusova, Lukas Wagner and Eugene Yaschenko (2007) Nucl. Acids Res. 2007 35: D5-D12, which is incorporated by reference as if it were completely presented), the Catalytic Site Atlas database (The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data Craig T. Porter, Gail J. Bartlett and Janet M. Thornton (2004) Nucl. Acids. Res. 32: 35 D129-D133; Analysis of Catalytic Residues in Enzyme Active Sites. Gail J. Bartlett, Craig T. Porter, Neera Borkakoti and Janet M. Thornton (2002) J Mol Biol 324: 105-121; Using a Library of Structural Templates to Recognise Catalytic Sites and Explore their Evolution in Homologous Families.James W. Torrance, Gail J. Bartlett, Craig T. Porter, Janet M. Thornton (2005) J Mol Biol. 347: 565-81, which are hereby incorporated by reference as if they were presented in full), and other sources of information on active sites. Insertions of intein at or near other protein sites, such as, but not limited to, allosteric affective sites, are also contemplated. An insertion site at or near another protein site is not limited to, but may be less than or equal to, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15 , 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1 angstrons away from the other site. C) The Proximity of the Insertion Site to a Local Secondary Structure
Integin insertion sites can occur within any type of local secondary structure. In one embodiment, the intein insertion site is close to a β loop leaf junction or an a-helix junction. As used in this context, “next” means the insertion site is within ten amino acids from a β loop leaf junction or an α-helix junction. As used here, the insertion site “within ten amino acids ”Of a β loop leaf junction or a-helix junction means that the insertion site is located before the amino acid that is 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1 amino acids away or at the junction of β loop sheet or loop a-helix junction. An intein can be inserted within 2 amino acids of a β loop leaf junction or within 20 2 amino acids of a loop α-helix junction. As used here, "within 2 amino acids" means that the intein is inserted before an amino acid that is 2 or 1 amino acids away from either the β loop leaf junction or the loop a-helix junction. Additional secondary structures, in which an intein can be inserted, include, but are not limited to, at or near the middle of a β sheet, at or near the middle of an a-helix or at or near the middle of a loop . Intein Insertion Site Prediction Summary
Based on one or more of A) the local sequence as described by the SVM, B) the distance from the site to the residues of active sites, and C) the proximity of the insertion site to a local secondary structure (for example, a β-loop leaf junction or a loop-a-helix junction), intein insertion sites, which can be used to control protein activity, can be predicted and then experimentally tested. The SVM model can be used to predict an insertion site that can be used to control protein activity on average within the top 25% of all sites. Integin 35 insertion sites can be positioned at or within 1 angstrons from the residues of active sites. The local secondary structure of intein insertion sites can be at or near the junction of loops or with β sheets or with a-helices.
After predicting an insertion site, the protein can be modified with an integine and selected. The selection may include functional assays to determine whether a protein modified with intein has a permissive, non-permissive, permissive condition-sensitive, permissible temperature-sensitive or switching phenotype. The selection may include physical assays to determine whether the intin 5 in the intin modified protein has spliced, cleaved or remained within the intin modified protein at construction or after exposure to induction conditions. Western blots can be used to determine whether intein, in the protein modified with intein, has spliced, cleaved or remained within the protein modified with intein. A combination of functional and physical assays can be employed to determine whether the protein modified with intein is a condition sensitive switcher. The combination of functional and physical assays can be used to determine whether the protein modified with intein is a temperature sensitive commutator by constructing the protein, exposing it to an induction temperature and conducting the functional and physical assays.
A protein modified with intein can be constructed without using the prediction method by inserting an intein before any C / S / T position. The C / S / T position can be natural or introduced.
A sequence encoding protein modified with intein can be mutated. Mutations can be performed on sequences that encode intein, sequences that encode extein, or a combination thereof. Mutated intein-modified proteins can then be constructed and selected by functional and / or physical assays.
In one embodiment, an isolated protein is provided having a sequence of at least 70, 75, 80, 85, 90, 91, 92, 93, 94, 95, 96, 25 97, 98, 99 or 100% identity to to a protein having the sequence of any one of SEQ ID NOS: 1629 - 1784, 2373 - 2686 and 3313 - 3322. In one embodiment, to one or more proteins having less than 100% identity to their amino acid sequence corresponding to SEQ ID NOS: 1629 - 1784, 2373 - 2686 and 3313 - 3322 is a variant of the referenced protein or amino acid 30. In one embodiment, a protein, polypeptide, oligopeptide or isolated peptide having a sequence of at least 70, 75, 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, is provided 99 or 100% identity to a protein showing the sequence of any of SEQ ID NOS: 1629 - 1784, 2373 - 2686 and 3313 - 3322 together with 6, 10 to 50, 10 to 100, 10 to 150, 10 to 300, 10 35 to 400, 10 to 500, 10 to 600, 10 to 700, 10 to 800, 10 to 900 or 10 to all amino acids of a protein having the sequence of any of SEQ ID NOS: 1629 - 1784, 2373 - 2686 and 3313 - 3322. This list of sequence lengths encompasses each entire complete protein in SEQ ID NOS: 1629 - 1784, 2373 - 2686 and 3313 - 3322 and each shorter length within the list, even for proteins that are not. include more than 900 amino acids. For example, lengths of 6, 10 to 50, 10 to 100, 10 to 150, 10 to 300, 10 to 400 and 10 to all amino acids would apply to a 453 amino acid sequence. A range of lengths of amino acid sequences mentioned here includes each length of amino sequence within the range, including end points. The mentioned length of amino acids can start at any single position within a reference sequence, in which sufficient amino acids follow the unique position to accommodate the mentioned length. The range of sequence lengths can be extended in increments of 10 to 100 N 10 amino acids, where N = to an integer of ten or greater, for sequences of 1,000 amino acids or greater. Identity can be measured by the Smith-Waterman algorithm (Smith TF, Waterman MS (1981), "Identification of Common Molecular Subsequences”, Journal of Molecular Biology 147: 195-197, which is incorporated herein by reference, in its entirety , as if it were completely presented.) Peptides, 15 oligopeptides or polypeptides, with amino acid sequences shorter than the full length of any of SEQ ID NOS: 1629 - 1784, 2373 - 2686 and 3313 - 3322, can be used for countless applications, including, but not limited to, originating an antibody to detect a protein modified with intein or a fragment thereof The antibody can be used to detect whether a protein modified with intein, or a fragment thereof, is expressed in a plant , a plant tissue, a plant cell or a subcellular plant region or compartment. One embodiment provides an antibody that recognizes an epitope in an isolated amino acid sequence at least 90% identity to 6, 10 to 50, 10 to 100, 10 to 150, 10 to 300, 10 to 400, 10 to 500, 10 to 600, 10 to 25 700, 10 to 800, 10 at 900 or 10 to all contiguous amino acid residues of a protein having the sequence of any of SEQ ID NOS: 1629 - 1784, 2373 - 2686 and 3313-3322.
The person skilled in the art will conceive that variants of the above protein or amino acid sequences can be made by 30 conservative amino acid substitutions, and variants of any of the above sequences with conservative amino acid changes are provided as additional modalities. Proteins with any of the above sequences, but showing analogues of synthetic or non-naturally occurring amino acids (and / or with peptide bonds) are included in the modalities here. A conservative amino acid substitution may be an amino acid substitution that does not alter the relative load or size characteristics of the polypeptide, in which the amino acid substitution is made. Amino acids are sometimes specified using the standard letter code: Alanine (A), Serine (S), Threonine (T), Aspartic acid (D), Glutamic acid (E) Asparagine (N), Glutamine (Q) , Arginine (R), Lysine (K), Isoleucine (I), Leucine (L), Methionine (M), Valine (V), Phenyl-Alanine (F), Tyrosine (Y), Tryptophan (W), Proline ( P), Glycine (G), Histidine (H), Cysteine (C). "Hydrophobic amino acids" refers to A, L, I, V, P, F, W and M; "Polar amino acids" refers to G, S, T, Y, C, N and Q; and "loaded amino acids 5" refers to D, E, H, K and R. Conservative amino acid substitution may also include amino acid substitutions for those amino acids that are not critical for protein activity, or replacement of amino acids with other amino acids having properties similar (for example, acids, basic, positively or negatively charged, polar or non-polar, hydrophobic, charged, etc.), such that substitutions of a critical amino acid do not substantially alter activity. The following six groups, each, contain amino acids A which are substitutions of conservative amino acids by another: 1) Alanine (A), Serine (S), Threonine (T); 2) Aspartic acid (D), Glutamic acid (E); 3) Asparagine (N), Glutamine (Q); 4) Arginine (R), Lysine (K); 5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V); and 6) Phenyl-alanine (F), Tyrosine (Y), Tryptophan (W). A person skilled in the art will appreciate that the substitutions identified above are not the only possible conservative substitutions. For example, in some cases, all loaded amino acids can be considered as conservative substitutions for each other, whether positive or negative. In addition, substitutions, deletions or individual additions that alter, add or delete a single amino acid, or a small percentage of amino acids in a coded sequence, can also be conservative amino acid substitutions. Conservative amino acid substitution tables, providing functionally similar amino acids, are well known in the art and conservative amino acid changes, as known in the art, are contemplated here. Conservative nucleotide substitutions in a nucleic acid, which encodes an isolated protein, are also contemplated in the present modalities. Conservative nucleotide substitutions include, but are not limited to, those that affect a conservative amino acid substitution in the encoded amino acid sequence. In addition, substitutions of degenerate conservative nucleotides can be made in a gene sequence by replacing a codon with a different amino acid with a different codon for the same amino acid.
Proteins, polypeptides, oligopeptides or peptides and variants thereof, isolated, can be prepared according to methods for preparing or altering polypeptide sequences, and their coding nucleic acid sequences, known to the person skilled in the art, as they are. found in common molecular biology references, for example, Molecular Cloning: A Laboratory Manual, J. Sambrook, et al., eds., Second Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY (1989) or Current Protocols in Molecular Biology, FM Ausubel, et al., Eds., John Wiley & Sons, Inc., New York, which are incorporated herein, as if they were presented completely. Isolated proteins, polypeptides, oligopeptides or peptides can include natural amino acids, natural amino acid analogs or synthetic amino acid analogs.
In one embodiment, an isolated nucleic acid, or the complement thereof, is provided, presenting a sequence that encodes an amino acid sequence with at least 70, 75, 80, 85, 90, 91, 92, 93, 94, 95, 96 , 97, 98, 99 or 100% identity to a protein having the sequence of any one of SEQ ID NOS: 1629 - 1784, 2373 - 2686 and 3313 - 3322. In one embodiment, the nucleic acid, which encodes an amino acid showing less than 100% identity with respect to the reference sequence, encodes a variant of the reference sequence. In one embodiment, an isolated nucleic acid, polynucleotide or oligonucleotide is provided having a sequence encoding a sequence of 15 amino acids with at least 75, 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98 , 99 or 100% identity, to an amino acid sequence of any of SEQ ID NOS: 1629 - 1784, 2373 - 2686 and 3313 - 3322 together with 6, 10 to 50, 10 to 100, 10 to 150 , 10 to 300, 10 to 400, 10 to 500, 10 to 600, 10 to 700, 10 to 800, 10 to 900 or 10 to all amino acids of a protein having the sequence of any of the 20 SEQ ID NOS: 1629 - 1784, 2373 - 2686 and 3313 - 3322. In one embodiment, the complement of nucleic acid, polynucleotide or isolated oligonucleotide is provided. This sequence length list includes each full length protein in SEQ ID NOS: 1629 - 1784, 2373 - 2686 and 3313 - 3322 and each shorter length within the list, even for proteins that do not include more than 900 amino acids. For example, lengths of 6, 10 to 50, 10 to 100, 10 to 150, 10 to 300, 10 to 400 and 10 to all amino acids would apply to a 453 amino acid sequence. The range of sequence lengths can be extended in increments of 10 to 100 N amino acids, where N = an integer of ten or more, for sequences of 1,000 amino acids or greater. Identity can be measured by the Smith-30 Waterman algorithm (Smith TF, Waterman MS (1981), “Identification of Common Molecular Subsequences”, Journal of Molecular Biology 147: 195 - 197, which is incorporated herein by reference, in its as if it were completely presented).
In one embodiment, an isolated nucleic acid having a sequence that hybridizes to a nucleic acid having the sequence of any of SEQ ID NOS: 1629 - 1784, 2373 - 2686 and 3313 - 3322 or the complements thereof is provided. In one embodiment, hybridization conditions can be of low rigor. In one embodiment, hybridization conditions can be of moderate stringency. In one embodiment, hybridization conditions can be highly stringent. Examples of hybridization protocols and methods for optimizing hybridization protocols are described in the following books: Molecular Cloning, T. Maniatis, E.F. Fritsch and J. Sambrook, Cold Spring Harbor Laboratory, 1982; and Current Protocols in Molecular Biology, F.M.Ausubel, R. Brent, R.E. Kingston, D.D. Moore, J.G. Seidman, J.A. Smith, K. Struhl, Volume 1, John Wiley & Sons, 2000, which are incorporated by reference, in their entirety, as if they were fully presented. By way of example, but not limitation, procedures for moderate stringency hybridization conditions are as follows: filters containing DNA are pre-treated for 2-4 hours at 68 ° C, in a solution containing 6X SSC (Amresco, Inc ., Solon, OH), 0.5% SDS (Amersco, Inc., Solon, OH), 5X Denhardt's solution (Amersco, Inc., Solon, OH), and 100 pg / mL of salmon sperm DNA Denatured A (Invitrogen Life Technologies, Inc., Carlsbad, CA). Approximately 0.2 ml_ of the pretreatment solution is used per square centimeter of membrane used. Hybridizations are performed in the same solution with the following modifications: EDTA 0.01 (Amersco, Inc., Solon, OH), 100 pg / mL of salmon sperm DNA and 5-20 X 106 cpm from probes labeled with 32P or fluorescently can be used. The filters are incubated in the hybridization mixture for 16-20 h, at 68 ° C, and then washed for 15 minutes at room temperature (within five degrees of 25 ° C) in a solution containing 2X SSC and SDS at 0 ° C. , 1%, with gentle agitation. The washing solution is replaced by a solution containing 0.1X SSC and 0.5% SDS, and an additional 2 h is incubated at 68 ° C with gentle agitation. The filters are bottled dry and exposed for development in an image generator or by radiography. If necessary, the filters are washed a third time and exposed again for development. By way of example, but not limitation, low stringency refers to hybridization conditions that employ low temperature for hybridization, for example, temperatures between 37 ° C and 60 ° C. By way of example, but not limitation, high stringency refers to hybridization conditions as presented above, but with the modification of employing high temperatures, for example, hybridization temperatures above 68 ° C.
In one embodiment, an isolated nucleic acid, polynucleotide or oligonucleotide, which encodes at least a portion of any of the amino acid sequences of SEQ ID NOS: 1629 - 1784, 2373 - 2686 and 3313 - 3322, can be used as a probe or primer hybridization. In one embodiment, an isolated nucleic acid, polynucleotide or oligonucleotide, having a sequence of or complementary to a portion of one of SEQ ID NOS: 1785 - 1923, 2052, 2058, 2687 - 3000 and 3323 - 3330, can be used as a hybridization probe or initiator. Nucleic acids, polynucleotides or oligonucleotides isolated here are not limited to, but may have a length in the range of 10 to 100, 10 to 90, 10 to 80, 10 to 70, 10 to 60, 10 to 50, 10 to 40, 10 to 35, 10 to 30, 10 to 25, 10 to 20 or 10 to 15 nucleotides, or 20 to 30 nucleotide residues, or 25 nucleotide residues. A range of nucleotide sequence lengths mentioned here includes each nucleotide sequence length within the range, end points 5 inclusive. The mentioned nucleotide length can start at any single position within a reference sequence, in which sufficient nucleotides follow the unique position to accommodate the mentioned length. In one embodiment, a hybridization probe or initiator is 85 to 100%, 90 to 100%, 91 to 100%, 92 to 100%, 93 to 100%, 94 to 100%, 95 to 100%, 96 to 100% , 97 to 100%, 98 to 100%, 99 to 100% or 10 100% complementary to a nucleic acid of the same length as the probe or primer and presenting a sequence chosen from a length of A nucleotides corresponding to the length of the probe or primer within a nucleic acid encoding one of the proteins of SEQ ID NOS: 1629 - 1784, 2373 - 2686 and 3313 - 3322 or the nucleic acid complement. In one embodiment, a probe or hybridization primer is 85 to 100%, 90 to 100%, 91 to 100%, 92 to 100%, 93 to 100%, 94 to 100%, 95 to 100%, 96 to 100%, 97 to 100%, 98 to 100%, 99 to 100% or 100% complementary to a nucleic acid of the same length as the probe or primer and presenting a sequence chosen from a nucleotide length corresponding to the length of the probe or primer within a nucleic acid having the sequence of one of SEQ ID NOS: 1785 - 1923, 2052, 2058, 2687 - 3000 and 3323 - 3330 or the nucleic acid complement. In one embodiment, a hybridization probe or primer hybridizes along its length with a corresponding length of a nucleic acid that encodes the sequence of one of SEQ ID NOS: 1629 - 1784, 2373 - 2686 and 3313 - 3322 or the Q 25 complement of nucleic acid. In one embodiment, a hybridization probe or initiator hybridizes along its length with a corresponding length of a nucleic acid having the sequence of one of SEQ ID NOS: 1785 - 1923, 2052, 2058, 2687 - 3000 and 3323 - 3330 or the nucleic acid complement. In one embodiment, hybridization can occur under conditions of low stringency. In one embodiment, hybridization can occur under conditions of moderate stringency. In one embodiment, hybridization can occur under conditions of high stringency.
Nucleic acids, polynucleotides or oligonucleotides isolated from modalities herein can include natural nucleotides, natural nucleotide analogs or synthetic nucleotide analogs. Nucleic acids, polynucleotides or oligonucleotides of modalities herein can be any type of nucleic acid including deoxyribonucleic acid (DNA), ribonucleic acid (RNA) or peptide nucleic acid (ANP). SEQ ID NOS: 1785 - 1923 are listed as DNA sequences, but RNA sequences, in which U replaces T, in SEQ ID NOS: 1785 - 1923, are also contemplated as nucleic acids of the modalities here.
Although unlabeled hybridization probes or primers can be used in the embodiments here, hybridization probes or primers can be detectably labeled and could be used to detect, sequence or synthesize nucleic acids. Exemplary markers include, but are not limited to, radio-nuclides, chemical portions that absorb light, dyes and fluorescent portions. The marker can be a fluorescent moiety, such as 6-carboxy-fluorescein (FAM), 6-carboxy-4,7,2 ', 7'-tetrachloro-fluorescein (TET), rhodamine, JOE (2,7-dimethoxy- 4,5-dichloro-6-carboxy-fluorescein), HEX (hexachlor-6- 10 carboxy-fluorescein) or VIC.
In one embodiment, an isolated nucleic acid, polynucleotide or oligonucleotide is provided, which encodes an intein-modified protein, a variant of an intein-modified protein or a fragment of an intein-modified protein, in an expression construct suitable for expression in a desired host. The fragment of an intein-modified protein can include a portion of the intein-modified protein that retains the activity of the intein-modified protein. However, the fragment may also have another utility, such as, but not limited to, serving as an antigen to prepare the antibody, which can then be used to detect a modified protein with intein or a fragment thereof in or be extracted from a plant, plant tissue, plant cell or region or subcellular plant compartment. Nucleic acid can include a sequence that encodes an amino acid sequence with at least 70, 75, 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% identity to to a protein having the sequence of any of SEQ ID NOS: 1629 - 1784, O 25 2373 - 2686 and 3313 - 3322. A fragment of the protein modified with intein, which encodes nucleic acid in an expression construct, can encode a sequence amino acids showing 75, 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% identity to an amino acid sequence of any one of SEQ ID NOS: 1629 - 1784, 2373 - 2686 and 3313 - 3322 over 6, 10 to 50, 10 to 100, 10 to 150, 30 10 to 300, 10 to 400, 10 to 500, 10 to 600, 10 to 700, 10 to 800 , 10 to 900 or 10 to all amino acids of a protein showing the sequence of any of SEQ ID NOS: 1629 -1784, 2373 - 2686 and 3313 - 3322. This list of sequence lengths encompasses each full-length protein n SEQ ID NOS: 1629 - 1784, 2373 - 2686 and 3313 - 3322 and each shorter length within the list, even for proteins 35 that do not include more than 900 amino acids. For example, lengths of 6, 10 to 50, 10 to 100, 10 to 150, 10 to 300, 10 to 400 and 10 to all amino acids would apply to a 453 amino acid sequence. The range of sequence lengths can be extended by increments of 10 to 100 N amino acids, where N = an integer of ten or more, for sequences of 1,000 amino acids or greater. Nucleic acid can include a sequence that hybridizes to a nucleic acid having the sequence or complement of one of SEQ ID NOS: 1785 - 1923, 2052, 2058, 2687 - 3000 and 3323 - 3330. In one embodiment, hybridization it can occur under 5 conditions under conditions of moderate rigor. In one embodiment, hybridization can occur under conditions under conditions of low stringency. In one embodiment, hybridization can occur under conditions under conditions of high stringency.
The expression construct can be any expression construct suitable for expression of the protein modified with intein or fragment thereof in a suitable host. One modality is the expression construct pAG2005 (SEQ ID NO: 1) or any expression construct showing at least A 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% identity with respect to to the sequence of SEQ ID NO: 1. In a preferred embodiment, a nucleic acid, which encodes any of the proteins in the preceding paragraph or a fragment thereof, is supplied in 15 pAG2005. Nucleic acid can be cloned at the Kpnl and EcoRI sites in pAG2005 and under the control of the ubiquitin rice promoter.
Nucleic acids, polynucleotides or oligonucleotides isolated in an expression construct can be codon optimized for an expression host. Codon optimization can be, but is not limited to, 20 codon optimization for a plant. Codon optimization can be for fast-growing grass, corn, miscellaneous, sorghum, sugar cane, wheat or rice.
The host for an expression construct featuring one or more of the nucleic acids, polynucleotides or oligonucleotides can be a plant. The plant may be a monocot plant. The monocotyledonous plant can be, but is not limited to, fast-growing grass, corn, miscellaneous, sorghum, sugar cane, wheat or rice. The plant could be a dicot plant. The dicot plant can be, but is not limited to, soy, canola, poplar, willow or rapeseed. The expression construct can be pAG2005 (SEQ ID NO: 1), which is illustrated in FIGs. 2A - 2B. The nucleic acid in the expression construct can be operably linked to a promoter. The promoter can control the expression of the modified protein with intein or fragment thereof and the promoter can be, but is not limited to, a plant ubiquitin promoter system, the corn ubiquitin promoter, a modified corn ubiquitin promoter which lacks one or more elements of heat shock, rice ubiquitin promoter, rice actin 1 promoter, rice actin 2 promoter, gamma-zein promoter, glutelin promoter, corn PR-1 promoter, corn dehydrogenase alcohol promoter, CaMV 19S promoter, CaMV 35S promoter, 35S intensified mas promoter, 35S minimum promoter promoter, Arabidopsis PR-1 promoter, tobacco PR-1 a promoter, opaline promoter synthase, soy heat shock promoter, octopine synthase promoter, manopin synthase promoter, a syntactic promoter, an alcohol-inducible promoter, a tetracycline-inducible promoter, a steroid-inducible promoter, an inducible promoter hormone, a 5-ecdysone receptor-based promoter, a promoter that responds to copper in yeast, a metallothionein promoter, a heat regulated promoter, a cold-inducible promoter, potato alpha-amylase promoter, a regulated promoter light, a corn chlorophyll a / b promoter, a dark and light active Cab promoter, a tissue specific promoter, a root promoter, a seed specific promoter, or a constitutive promoter. The promoter could be a constitutive or inducible promoter and could be the rice ubiquitin, corn ubiquitin, gamma-zein, glutelin or rice actin promoter. Nucleic acid can be supplied in pAG2005 operably linked to the ubiquitin rice promoter, and the construct can be supplied in fast-growing grass, corn, miscellaneous, sorghum, sugar cane, wheat or rice. Nucleic acid can be cloned at the Kpnl and EcoRI sites in pAG2005 and under the control of the ubiquitin rice promoter. In one embodiment, if the nucleic acid in any of the above expression constructs encodes an amino acid sequence showing less than 100% identity to any of SEQ ID NOS: 1629 - 1784, 2373 - 2686 and 3313 - 3322, it encodes a variant of the 20 amino acid sequence.
Referring to FIGs. 2A to 2B, pAG2005 (SEQ ID NO: 1) includes a uryktin 3 gene promoter from Oryza sativa with the first intron (OsUbi3 promoter, nucleotides 12 - 2094), a sequence encoding phosphomannose isomerase enzyme used for selection of transformants ( PMI, nucleotides 2104 - 25 3279), a left T-DNA boundary (LB, nucleotides 3674 - 3698), a ColEI replication origin (Oh, nucleotide 6970), a right T-DNA boundary (RB, 9717 nucleotides) - 9741), a second OsUbi3 promoter with the first intron (nucleotides 9948 - 12015), and a Nos terminator (nucleotides 12035 - 12310), in which the nucleotide numbers are indexed relative to nucleotide 1 within the EcoRI 30 sequence at the 5 'end of the Osllbi3 promoter that drives PMI.
In one embodiment, a transgenic plant is supplied with one or more of the nucleic acids, polynucleotides, oligonucleotides and / or expression constructs isolated here. Nucleic acid, polynucleotide, oligonucleotide and / or isolated expression construct can be introduced into the plant 35 by Agrobacterium-mediated transformation or by any other suitable method known in the art. Agrobacterium-mediated transformation of immature corn embryos can be performed as described in Negrotto, et al., (2000) Plant Cell Reports19: 798-803, which is incorporated by reference, as if it were completely presented.
The modalities here also include mutant integers, which may be, but are not limited to, uses such as modifying a protein. Mutant integers include, but are not limited to, those having at least 5 70, 75, 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100% identity to a protein having the sequence of any of SEQ ID NOS: 92 - 103 and 2373 - 2686, or any of the integers contained in any of SEQ ID NOS: 1675, 1678 - 1681, 1689, 1691, 1700 - 1708, 1710 and 3315 - 3322. The modalities also include a nucleic acid which encodes a mutant integin including, but not limited to, mutant integins having at least 70, 75, 80, 85, 90, 91, 92, 93, 94, 95, 96 , 97, 98, 99 or 100% identity to a protein having the sequence of any of SEQ ID NOS: 92 - 103 and 2373 - 2686, or any of the integers contained in any of SEQ ID NOS: 1675 , 1678 - 1681, 1689, 1691, 1700 - 15 1708, 1710 and 3315 - 3322. The modalities also include a nucleic acid encoding a mutant integin, with the nucleic acid hybridizing it matches with a nucleic acid, or with its complement, presenting the sequence of one of the sequences that encode intein contained in any of SEQ ID NOS: 3323 - 3330. In one embodiment, hybridization can occur under conditions of low stringency. In one embodiment, hybridization can occur under conditions of moderate stringency. In one embodiment, hybridization can occur under conditions of high stringency. A mutant intein can be inducible to cleave and / or mend from a protein in which it is inserted. Induction conditions may include exposure of the intine to changes in physical or chemical conditions, such as, but not limited to, changes in temperature, pH, concentration of splicing inhibitors, concentration of binders, light, salinity and pressure conditions. The induction condition can be, but is not limited to, an elevated temperature. The elevated temperature may be within, but not limited to, the 50 - 70 ° C range, which includes temperatures of 50 ° C and 70 ° C. The elevated temperature can be greater than or equal to a temperature in integer increments within the range of 25 - 70 ° C, extreme points included. The elevated temperature can be greater than or equal to 50 ° C, 55 ° C, 59.9 ° C, 60 ° C, 65 ° C or 70 ° C. An intein showing at least 70, 75, 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% identity to a protein having the sequence of SEQ ID NOS: 2, 3, 4 - 103, 113 - 392, or any of the integers contained in any of SEQ ID NOS: 1675, 1678-1681, 1689, 1691, 1700- 1708 and 1710 can be used to modify a protein, enzyme , cellulase or xylanase. A nucleic acid that hybridizes to a nucleic acid encoding SEQ ID NOS: 92 - 103, or any of the integins in any of SEQ ID NOS: 1675, 1678 - 1681, 1689, 1691, 1700 -1708 and 1710, or a complement to them, can be used to modify a protein, enzyme, cellulase or xylanase at the nucleic acid level. The sequence of intein in each of SEQ ID NOS: 1675, 1678 - 1681, 1689, 1691, 1700 - 1708 and 1710 can be found by comparing each of the sequences with the 5 th sequence of SEQ ID NO: 91 .
As described above, the modalities include amino acid sequences, such a sequence comprising 75, 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% identity with respect to an amino acid sequence from any one of SEQ ID NOS: 1629-1784, 2373-2686 and 3313-3322 10 over 6, 10 to 50, 10 to 100, 10 to 150, 10 to 300, 10 to 400, 10 to 500, 10 to 600, 10 to 700, 10 to 800, 10 to 900 or 10 to all amino acids of a protein having the sequence of one of SEQ ID NOS: 1629 - 1784, 2373 - 2686 and 3313 - 3322. embodiments also include nucleic acids that encode amino acid sequences, and antibiotics that recognize epitopes 15 in the amino acid sequences. An amino acid sequence of less than full length can be selected from any portion of one of the sequences of SEQ ID NOS: 1629 - 1784, 2373 - 2686 and 3313 - 3322 corresponding to the mentioned length of amino acids. A sequence of amino acids of less than full length can be selected from a portion of 20 any of SEQ ID NOS: 1629 - 1784, 2373 - 2686 and 3313 - 3322 showing an upstream intein-extein junction with the residue of the C-terminus of the N-extein and the N-terminal residue of the intein at any two adjacent positions there. For example, positions 134 and 135 in each of SEQ ID NOS: 3313 - 3322 are the C-terminal residue of the N-extein and the N-terminal residue of the intein for each respective sequence, and an amino acid sequence of less than full length, selected from any of SEQ ID NOS: 3313 - 3322, can include residues 134 and 135 in any two respective consecutive positions, within the mentioned length. A sequence of amino acids of less than full length can be selected from a portion of any one of SEQ ID NOS: 1629 - 1784, 2373 - 2686 and 3313 - 3322 showing an intein-extein junction downstream with the residue of the C-terminal of the intein and the N-terminal residue of the C-extein at any two adjacent positions there. For example, positions 616 and 617 in each of SEQ ID NOS: 3313 - 3322 are the C-terminal residue of the intein and the N-terminal residue of the C-extein for each respective sequence, and an amino acid sequence of less than full length, selected from any of SEQ ID NOS: 3313 - 3322, can include residues 616 and 617 in any two respective consecutive positions within the mentioned length. An amino acid sequence of less than full length can be selected from a portion of any of SEQ ID NOS: 1629 - 1784, 2373 - 2686 and 3313 - 3322, with the selected portion including at least one amino acid other than that the sequence of native intein or native protein at a position within the portion. For example, the following sequences include a 5 mutation (indicated by “ZAAi # AAj” following the sequence ID) with respect to a base sequence (SEQ ID NOS: 2518): SEQ ID NO: 3315 / R322H; SEQ ID NO: 3315 / R398W; SEQ ID NO: 3315 / 1412V; SEQ ID NO: 3315 / T415M; SEQ ID NO: 3316 / D188E; SEQ ID NO: 3316 / K245N; SEQ ID NO: 3316 / T402A; SEQ ID NO: 3316 / R504G; SEQ ID NO: 3316 / K566N; SEQ ID NO: 3317 / K245M; SEQ ID NO: 3317 / D418V; SEQ ID NO: 3317 / S585I; SEQ ID NO: 3318N231L; SEQ ID NO: 3318 / P282S; SEQ ID NO: 3318 / K402M; SEQ ID NO: 3318 / E545D; SEQ ID NO: 3318 / 1618N; SEQ ID NO: 3319 / P134S; SEQ ID NO: 3319 / E405K; SEQ ID NO: 3319 / N747Y; SEQ ID NO: 3320 / P134S; SEQ ID NO: 3320 / R345M; SEQ ID NO: 3320N589D; SEQ ID NO: 3321 / T30I; SEQ ID NO: 3321 / E331G; SEQ ID NO: 3321 / G366E; SEQ ID NO: 3321 / L578M; SEQ ID NO: 3322 / P189L; SEQ ID NO: 3322 / G242A and SEQ ID NO: 3322 / N730D. An amino acid sequence of less than full length, selected from one of SEQ ID NOS: 3313 - 3322, may include one or more of the amino acid changes listed above with respect to SEQ ID NOS: 2518. An amino acid sequence of less than full length, selected from any other sequence here with one or more amino acid changes, relative to native intein or native enzyme, can be similarly selected. The relative position of the amino acid changes, with respect to each other, can be maintained if the selected less than full length amino acid sequence includes more than one amino acid change. However, the change or changes can (m) otherwise appear anywhere within the said length of the less than full length amino acid sequence. A nucleic acid provided here can encode any of these amino acid sequences less than full in length. A nucleic acid provided here can be any length described above, including an amino acid sequence of less than full length that encodes a portion of an intein-modified protein having at least one of an upstream intein-extein junction, a junction of intein-extein downstream or a change in a protein sequence modified with intein compared to the sequence of native intein or native protein. The nucleotides encoding the junctions or sites of change can be located at any respective position along the length of said length of the nucleic acid. An antibody provided here can recognize an epitope on any of these amino acid sequences of less than full length. The epitope may include an upstream intein-extein junction, a downstream intein-extein junction, one or more changes in the amino acid sequence of less than complete length in relation to the native intein or native protein sequence, or any another sequence in the amino acid sequence of less than full length.
Any single modality here can be supplemented with one or more elements from any one or more other modalities here.
Examples - The following non-limiting examples are provided to illustrate particular modalities. The modalities throughout the text can be supplemented with one or more details from any one or 10 more examples below.
Example 1 - Prediction of intein insertion site. OA use of A) the local sequence as predicted by an SVM, B) the distance from the site to the residues of active sites or C) the proximity of the insertion site to a local secondary structure (for example, the end of an alpha -helix or beta leaf) 15 allowed the prediction of insertion sites in the following xylanases and cellulases: the xylanase from Bacillus sp. NG-27 (accession number 030700 (SEQ ID NO: 106)); Clostridium stercorariumxynB xylanase (accession number P40942 (SEQ ID NO: 108)); Thermomyces lanuginosus xynA xylanase (accession number 043097 (SEQ ID NO: 107)); xylanase from Dictyoglomus thermophilum xynB (accession number P77853 (SEQ 20 ID NO: 104)); the cellulase of Clostridium thermocellum celK (accession number 068438 (SEQ ID NO: 109)); Thermomonospora fusca celB cellulase (accession number P26222 (SEQ ID NO: 110)); Acidothermus cellulolyticus cellulase (accession number P54583 (SEQ ID NO: 111)); and Nasutitermes takasagoensis cellulase (accession number 077044 (SEQ ID NO: 112)). For each of these xylanases and cellulases, the distance between each C / T / S site in the enzyme and its active site was calculated based on the shortest distance between any atom in the C / T / S residue and any atom in any of the waste from the active site. Then, the SVM classification of each NNNXNN local sequence cassette was obtained, where X is C / T / S. SVM was trained and used as described above using the 30 SEQ ID NOS: 1233 - 1512 intein insert cassette sequences. The validity of SVM was tested using: 1) A random set of m (m varied from 1 to 250) true positive training set sites with unique sequences selected from the protein library containing integers of SEQ ID NOS: 1233 - 1512; 2) true negatives including 3 other random cassettes from the extein sequences from which the 35 true positive insertion cassettes were selected (SEQ ID NOS: 673 - 1232); 3) the remaining sequences from the cassettes of intein insertion sites of SEQ ID NOS: 1233 - 1512 as true positive test sets, with the known intein insertion sites being filtered to remove sequences in the training set; and 4) true negatives in the test set selected from other C / S / T sites in the extein sequences (SEQ ID NOS: 673 - 1232). Each true negative in the training set included the same central amino acid X as the corresponding true positive, but there was no insertion of intein at that true negative amino acid position.
Sites that were at or closer to 10 angstroms and / or that had an SVM rating greater than 0 were included for further analysis. Sites that scored high in the SVM classification, but that were further apart than 20 angstrons, were excluded. Next, 10 the secondary structure of all candidate sites was determined and the sites that were located at the junction of (a-helix or β sheet) of loop are prioritized. Sites that were located on long surface loops, which were not immediately adjacent to the active site, or sites that were in the nucleus of the protein were also excluded. A list of such predicted insertion sites is shown in Table 2, below. Table 2

Example 2 - Assays for cloning, expression and xylanase activity. Wild-type xylanases were cloned for expression in the phage and lamina phage selection systems. Nucleic acids encoding nine xylanases were amplified by PCR with or without a 6-His marker attached to the carboxy terminus (referred to herein as the "C terminus") of the coding sequence. These xylanases were the xylanase from uncultivated bacteria GH11 (number 10 accession EU591743 (SEQ ID NO: 1924)), the xylanase from Bacillus sp. NG-27 (accession number 030700 (SEQ ID NO: 106)), Thermomyces lanuginosus xynA xylanase (accession number 043097 (SEQ ID NO: 107)), Clostridium stercorarium xynA xylanase (accession number P33558 (SEQ ID NO: 1925)), Clostridium thermocellum xynY xylanase (accession number P51584 (SEQ ID NO: 105)), Dictyoglomus thermophilum xynB xylanase (accession number P77853 (SEQ ID NO: 104)), xylanase de Clostridium stercorarium xynB xylanase (accession number P40942 (SEQ ID NO: 108)), Erwinia chrysanthemi xylanase (accession number Q46961 (SEQ ID NO: 5 1926)), and Thermotoga sp. xynA (accession number Q60044 (SEQ ID NO: 1927)). PCR products were digested by EcoRI / Xhol (37 ° C for one hour), column purified (MinElute PCR purification kit, Qiagen), and ligated (4 ° C for at least 40 hours or 12 ° C for at least minus 12 hours) to a previously digested vector ZAP ™ II vector (Stratagene). The expression cassette in vector À. ZAP ™ II is shown in FIG. 35 with a gene of interest represented by the gray box. Once the enzyme genes were linked to the previously digested vector, the vectors containing the enzyme genes were packaged in light phage (room temperature for 2 hours) with phage packaging extract (Stratagene). The recombinant phage was used to infect E. coli XL1-Blue MRF 'cells (Stratagene) and plated on NZY agar plates (described in the ZAP®-cDNA Gigapack III Gold Cloning Kit, Stratagene) containing 0.2% substrate AZCL-xylan (Megazyme). NZY agar plates include 10 g NZ amine (casein hydrolyzate), 5 g NaCI, 2 g MgSO4 »7H2O, 5 g yeast extract and 15 g agar per liter, pH adjusted to 7.5 with NaOH, and are sterilized in an autoclave, as described by seller 20 (Stratagene). AZCL-xylan substrate (Megazyme) includes cross-linked xylan with azurine, which is hydrolyzed to release dye and produce a blue color . After overnight incubation at 37 ° C, the plates were visually inspected for blue color development in and around phage plates. Xylanase activity was scored as active or inactive based on the ability to hydrolyze the 25 AZCL-xylan substrate and thereby develop a blue color in and around the phage plate. Selected plates were confirmed by PCR as containing the subject xylanase gene and replated on NZY agar plates containing 0.2% AZCL-xylan to confirm the xylanase enzymatic activity of the phage plate.
Each xylanase-expressing phage isolate was amplified in E. coli XL1-Blue MRF 'cells to generate high titer phage lysate, which was used in a second infection with E. coli XL1-Blue MRF' cells (Stratagene) in the presence of isopropyl β-D-1-thiogalactopyranoside (IPTG, dioxane free, 99% pure; available from Research Products International, Corp.) to induce xylanase expression. Aliquots of individual lysates were incubated at different temperatures 35 ranging from 4 ° C to 70 ° C for up to four hours and then cooled to 4 ° C for at least two hours. Xylanase activity from each lysate was measured either by an Enzchek® kit (Invitrogen ™) or by adding 0.2% AZCL-xylan substrate and incubating at 37 ° C or 70 ° C for 4 hours.
Xylanase activity was compared on NZY agar plates containing AZCL-xylan and in liquid assays. P77853 gave the strongest activity with or without His C-terminal marker, followed by P51584, 043097 and 030700 on NZY agar plates supplemented with AZCL-xylan substrate. In all cases, the 6-His marker suppressed at least some xylanase activity. Example 3 - Insertion of integins in xylanases.
Several integers were inserted into a subset of the predicted sites, as shown in Table 2, using a CPR approach. First, three pieces of DNA, “N” (for the amino terminus or N-extein fragment) and “C” (for the carboxy terminus or C-extein fragment) from a xylanase, and “I” (for intein) from an intein, were generated by CPR separately (Phusion ™ A Polymerase Taq (New England Biolabs), following the manufacturer's procedures). The intein I fragment was amplified so that it would have a 20 nucleotide overlap with the C-terminus of the N xylanase PCR fragment, and a 20 nucleotide overlap region with the N-terminus of the X xananase PCR fragment. fragments of N, I and C were then assembled into a contiguous gene that encodes an enzyme modified with intein, using a two-step PCR (Accumprime ™ of Taq Pfx polymerase (Invitrogen)). As used herein, "NIC" represents the fusion of the xylanase N-terminal DNA fragment to the desired integin, which is also fused to the xylanase C-terminal DNA fragment. Although “NIC” is used in the context of an integin-modified xylanase in this example, “NIC” can refer to the sequence contiguous to N-extein, intein and C-extein for any protein modified with intein. A naming convention for the different constructs was adopted, which follows the following format: (Target Enzyme) - (Inteína) - (Insertion Site) - (Number 9 25 of Mutant); for example, the Tth integin inserted in P77853 in S158 would be named P77853-Tth-S158. Likewise, the Tth integin inserted in P77853 in T134 would be named P77853-Tth-T134. Mutants of any enzyme modified with intein would then be named sequentially, with additional indents; e.g. P77853-Tth-S 158-1, P77853-Tth-S 158-2, P77853-Tth-S 158-3, P77853-Tth-S 158-4, etc.
Generally, the first stage in the NIC assembly uses 100 ng of each of the nucleic acids encoding N, I and C in a master mixture containing 1X buffer CPR reaction buffer, 200 pM of each dNTP, and a unit of Pfx Taq polymerase in 12.5 pL with a cycle at 95 ° C for two minutes, followed by five thermal cycles of three stages of 95 ° C for 20 seconds, 45 ° C for one minute, and 68 ° C for two minutes (alternatively three minutes can be used for the longest genes), followed by an extension of final CPR at 68 ° C for 15 minutes. The second step is an NIC amplification, in which the master mix containing assembled NIC is amplified by PCR using 0.15 pM primers, which hybridize to a 5 'and 3' end of the assembled NIC DNA. The thermal cycle used in the second stage uses a 95 ° C cycle for two minutes, followed by 27 three-stage thermal cycles at 95 ° C for 20 seconds, 58 ° C for 30 seconds, and 68 ° C for three minutes, followed by for an extension by final CPR at 68 ° C for 5 15 minutes.
Assembled NIC genes, prepared as described above, were gel purified using a QIAquick Gel Extraction kit (Qiagen) and digested with EcoRI and Xhol (New England Biolabs), gel purified using a QIAquick Gel Extraction kit (Qiagen) and ligated with previously cut vector ZAP®II 10 (Stratagene), following the procedure established in Example 2, above.
Products were plated on NZY agar plates containing 0.2% AZCL-xylan substrate, and the xylanase activity of the plates was scored after overnight incubation at 37 ° C. The plates were then incubated for up to four hours, at temperatures ranging from 37 ° C to 70 °, and the xylanase activity was scored for each plate again. Based on the activity scores following the overnight incubation and the second incubation, each plate was assigned a phenotype. The plates, which developed a blue color following overnight incubation at 37 ° C, remained blue after the second incubation at an elevated temperature and were classified as permissive. The plates that were inactive and did not develop a blue color following the overnight incubation at 37 ° C, but which developed a blue color following the second incubation at an elevated temperature, were classified as switching. The plates, which were inactive after overnight incubation at 37 ° C and following the second incubation at an elevated temperature, were classified as non-permissive. Based on the xylanase agar plate phenotype modified with intein bearing an intein at a specific site, the insertion of the respective intein was classified as permissive (insertion of intein does not interfere with the function of the protein, or the intein is amended during incubation overnight at 37 ° C), non-permissive (insertion of intein interferes with protein function in all conditions tested) or switching (xylanase activity is observed following the four-hour incubation at elevated temperature, but the activity not observed following overnight incubation at 37 ° C).
Individual plates were collected from plates corresponding to each insertion site and excised as phagemid following the manufacturer's protocol (Stratagene). Briefly, the ZAP®II lamination vector is designed to allow simple, efficient, in vivo excision and recircularization of any cloned insertion inside the lambda vector, to form a phagemid containing the cloned insertion. To excise cloned insertions in a phagemid, isolated places are transferred to a sterile microcentrifuge tube containing 500 pL of SM buffer (Stratagene) and 20 pL of chloroform (Sigma). The tube is vortexed to release the phage particles in the SM buffer. The tube is incubated for at least one hour at room temperature or overnight at 4 ° C. After incubation, 5 previously prepared XL1-Blue MRF 'cells (Stratagene) and SOLR ™ (Stratagene) are centrifuged at 1,000 x g for several minutes. The pellets are resuspended in 25 ml of 10 mM MgSO4 to an OD 600 of 1.0 (8 x 108 cells / ml) in 10 mM MgSO4. Once the cells were resuspended, 200 pL of XL1-Blue MRF 'cells at a D060O of 1.0, 250 pL of the desired isolated phage stock (containing> 1 x 10 5 phage particles), and 1 pL of the phage auxiliary ExAssist® (Stratagene) auxiliary phage (> 1 x 106 ufp / pL) is placed in a 15 ml_ polypropylene tube. The tube is incubated at 37 ° C for 15 minutes, to allow the phage to attach to the cells. After incubation, 3 ml of LB broth with supplements is added and the mixture is incubated for 2.5-3 hours, at 37 ° C with shaking. The mixture is then heated to 65-70 ° C for 20 15 minutes to lyse the light phage particles and cells. Following lysis, the cell fragments are pelleted by centrifuging the tube for 15 minutes at 1,000 x g. The supernatant is decanted into a new sterile tube. This supernatant contains the phagemid excised as filamentous phage particles. To plate the excised phagemids, 200 pL of freshly cultured SOLR ™ cells (DO600 = 1.0) are mixed with 100 pL of the phage supernatant in a 1.5 ml microcentrifuge tube. This mixture is incubated at 37 ° C for 15 minutes and then 200 µl of the cell mixture is spread on LB-ampicillin agar plates (100 µg / ml) and incubated overnight at 37 ° C. The resulting colonies contain the excised phagemid. Each phagemid contains an ampicillin resistance marker to support growth 25 in medium containing ampicillin. After PCR confirmation and DNA sequencing, phagemid clones were grown in self-inducing media (referred to here as AIM, obtained as Overnight Express ™ Instant TB Medium, and available from Novagen) for one night . The cells were lysed with FastBreak ™ lysis buffer (Promega) and tested for western blot 30 splicing.
Internin-modified xylanases were analyzed for plaque phenotype on NZY agar plates, and for precursor accumulation and mature xylanase accumulation using a modified western blot procedure (described below in Example 5).
A Psp-pol integin (SEQ ID NO: 3) was inserted into P77853 at positions S112 (SEQ ID NO. 1696) and S124 (SEQ ID NO; 1697), which were predicted as insertion sites in Example 1 (above). The plaque phenotype of these positions was classified as permissive for S112 and non-permissive for S124. In western blot, S112 accumulated some xylanase modified with precursor integin and some mature xylanase. S124 accumulated mainly xylanase modified with precursor integin. In addition to the predicted sites, the Psp-pol intein has also been inserted into several other sites. Among the other sites tested, S63 (SEQ ID NO: 1692), S86 (SEQ ID NO: 1694), S95 5 (SEQ ID NO: 1695) and S178 (SEQ ID NO: 1698) produced plaques that were classified as phenotypes of switching with Psp-pol. In western blot, these sites accumulated xylanase modified with the precursor integer when not heated, and also mature xylanase following the heat treatment of the phage lysate at 70 ° C.
A whole tag (SEQ ID NO: 90) was inserted into P77853 10 in positions S112, T113, S124, T134, T145, S158 and T199, which were predicted as insertion sites in Example 1 (above). The plates, which express the xylanase modified with intein in P77853 with the tagine integin, were scored according to their phenotype as follows: S112 (non-permissive), T113 (non-permissive), S124 (non-permissive), T134 (permissive) ), T145 (switch), S158 (non-permissive) and T199 (non-permissive). The xylanase modified with the precursor tag has accumulated for the inserts in S112, T113, S124, T134, T145, S158 and T199; however, only T145 and T199 accumulated mature xylanase. Other cleavage products have been observed in western blotem at other insertion sites.
A Tth integin (SEQ ID NO: 91) was inserted into xylanase 20 in P77853, at positions S112, T113, S124, T134, T145, S158 and T199, which were predicted as insertion sites in Example 1 (above). The plate phenotype of these positions has been classified as follows: S112 (permissive), S124 (switch), T113 (non-permissive), T134 (switch), S158 (switch), T145 (non-permissive) and T199 (non-permissive). In the western blot, some accumulation of modified xylanase with 25 precursor integers was detected for the S112, S124, T113, T134, S158, T145 and T199 insertion sites. Mature xylanase was detected in the western blot for S112, S124, T113, S158eT145.
Mini-PSP-Pol mPspM1L4 (SEQ ID NO: 7) and mPspM5L5 (SEQ ID NO: 36) proteins were inserted into the xylanase at P77853 at insertion site 30 S112, which was predicted as insertion sites in Example 1 (above). Plates expressing intein modified xylanase in P77853 containing either mPspM1L4 or mPspM5L5 were classified as non-permissive phenotypes when inserted into S112, and were not analyzed by western blot. Likewise, mini-Psp-Pol mPspM1L4 (SEQ ID NO: 7 ), mPspM1L7 (SEQ ID NO: 10), mPspM2L5 (SEQ ID NO: 15), 35 mPspM4L3 (SEQ ID NO: 27), mPspM5L2 (SEQ ID NO: 33), mPspM5L5 (SEQ ID NO: 36) and mPspM7L3 ( SEQ ID NO: 48) generated non-permissive plaque phenotypes when inserted into the xylanase in P77853 in S67. In contrast, those same internines (mPspM1L4 (SEQ ID NO: 7), mPspM1L7 (SEQ ID NO: 10), mPspM2L5 (SEQ ID NO: 15), mPspM4L3 (SEQ ID NO: 27), mPspM5L2 (SEQ ID NO: 33) ), mPspM5L5 (SEQ ID NO: 36) and mPspM7L3 (SEQ ID NO: 48)) generated permissive plaques when inserted into the xylanase in P77853 in S95 and S178.
A PSP-Pol integin (SEQ ID NO: 3) was inserted into 5 xylanase 030700 at positions S215, S314 and S357, which were predicted in Example 1 (above). The plaque phenotype of the Psp-pol intein inserted in these positions was classified as not permissive for S215 and S314, but permissive for S357. In contrast, when mini Psp-Pol mPspM1L4 (SEQ ID NO: 7) and mPspM3L5 (SEQ ID NO: 22) were inserted in the same sites, S314 was classified as permissive, while 10 S215 and S357 were classified as non-permissive .
A Tth integin (SEQ ID NO: 91) was inserted into xylanase 030700 at positions S95, T137, S215, T250, S358, S314 and S357, which were predicted in Example 1 (above). The phage plaque phenotype expressing xylanase 030700 with the inserted Tth integin was classified as: S95 (permissive), T137 (non-permissive), S215 15 (non-permissive), T250 (non-permissive), S314 (permissive), S357 ( non-permissive) and S358 (permissive).
An Mth intein (SEQ ID NO: 2) and a Tag integin (SEQ ID NO: 90) were separately fused to the C-terminal of xylanase 030700 in individual experiments and the resulting intein-modified proteins were active 20 after overnight incubation at 37 ° C, indicating that the fusion to the C terminal with the Mth and Tag integers was permissive with 030700.
A Tth integin (SEQ ID NO: 91) was inserted into xylanase 043097 in positions S47, S50, S103, T111, T126, S130, T134, T151, T152, S158, T164, S170, T208, S213 and S214, which were predicted in Example 1 (above). Phage plates, 25 expressing xthanase 043097 modified with Tth intein, were classified for phenotype as follows: S47 (permissive), T134 (non-permissive), T151 (non-permissive), T152 (non-permissive), S158 (non-permissive) ), T164 (non-permissive), S170 (non-permissive), T208 (non-permissive), S213 (permissive), S214 (permissive). In western blot analysis, the xthanase precursor 043097 modified with Tin intein was observed for insertion sites S47, S50, S103, T111, S130, T164, S213 and S214, and mature xylanase 043097 was observed for S47, S50, S103 , S213 and S214. Phage lysates from phages expressing xthanase 043097 modified with Tth intein, at positions T126, T134 and T152, S158 were not analyzed by western blot.
As shown above, the insertion of an intein, at a predicted insertion site based on the method described herein, can result in an intein-modified protein that exhibits a switching phenotype. However, the method also leads to permissive candidates or non-permissive candidates, who may or may not be cleaved or amended. In addition, the insertion of intein in sites, different from those found by the method, can result in a switching phenotype. The method, however, enriches the set of candidates for insertion sites, which are more likely to lead to a switching phenotype.
Example 4 - Mutagenesis of enzymes modified with 5 integin. Many different methods of protein mutagenesis exist in the art, but, as a non-limiting example, different specific strategies have been used to generate modified enzymes with variant integers, as shown below.
Random mutation was introduced in a xylanase, either xylanase modified with intein or intein in the examples above, using a 10 Mutazyme® mutagenesis kit (Stratagene). Each time a DNA template is amplified by Mutazyme®, there is a certain probability that a mutation will be introduced into the newly synthesized DNA. In practice, mutation rates are achieved by varying the amount of template DNA and the number of CPR cycles. The mutagenic CPR procedure here has been optimized to introduce 1-2 amino acid mutations per intein 15 when modifying the entire cassette or the portion encoding the intein.
For the entire cassette mutagenesis, five pg of NIC DNA phagemid was amplified by PCR for 10 cycles using the GeneMorph®ll Random Mutagenesis Kit (Stratagene) with forward and reverse M13 primers following the manufacturer's protocol. Briefly, five pg of 20 NIC DNA phagemid, to be mutagenized, are mixed with 1X buffer PCR reaction buffer, 200 pg of each dNTP, 0.15 pM primers complementary to the ends of the NIC DNA and 2.5 units of Mutazyme® II DNA polymerase in a final volume of 50 pL, and cycled at 95 ° C for two minutes, followed by 10 three-step thermal cycles at 95 ° C for 20 seconds, 58 ° C for 30 25 seconds and 68 ° C for three minutes (one minute per kilobase of mold), followed by an extension by final CPR at 68 ° C for 15 minutes. The amplification step was followed by 10 cycles of PCR with cloning primers for each NIC mutagenized DNA using regular Taq polymerase. The mutagenized DNA NIC library thus generated was gel purified using the QIAquick gel extraction kit (Qiagen), digested with EcoRI and Xhol (New England Biolabs), purified in column with the PCR MinElute purification kit (Qiagen), ligated in ZAP®II vector (Stratagene), packaged in light phage, as described above, and plated on NZY agar, as described above.
For intein mutagenesis, five pg of 35 plasmid DNA encoding intein were amplified by CPR for 10 cycles, with specific intrin end primers using the GeneMorph®ll Mutagenesis kit (Stratagene), following the manufacturer's protocol. Briefly, five pg of intein DNA to be mutagenized are mixed with 1X buffer CPR reaction buffer, 200 pg of each dNTP, 0.15 pM intein end specific primers and 2.5 units of DNA polymerase Mutazyme® in a final volume of 50 pL and with a cycle at 95 ° C for two minutes, followed by 10 thermal cycles of three steps at 95 ° C for 20 seconds, 58 ° C for 30 seconds and 68 ° C 5 for three minutes, followed by an extension by final CPR at 68 ° C for 15 minutes. The mutagenized integin library was then gel purified using the QIAquick gel extraction kit (Qiagen). N-terminal and C-terminal fragments of xylanase (N and C) were generated by PCR using regular Taq polymerase.
A DNA NIC with wild type and mutagenized type 10 N and C library I was assembled using the CPR procedure described above and cloned into ZAP®II vector for library selection on NZY A agar plates as described above.
For intein mutagenesis, a library of synthetic Tth intein mutagenesis (SEQ ID NO: 91) was also made. This library was designed so that each single amino acid substitution was present at least once at each position in the Tth integin. Once designed, the library was synthesized by GenScript. N-terminal and C-terminal fragments of xylanase (N and C) were generated by PCR using regular Taq polymerase. A NIC DNA with a Tth library of N and C wild type and synthetic mutagenized, I, was assembled using the PCR procedure described above and cloned for library selection.
The following mutagenized libraries were created by these procedures: 1. A mutagenized library with integral cassettes, in which the cassette containing mini-Psp-Pol mPspM1L4 whole inserted in P77853 9 25 at the S67 site was mutagenized; 2. A mutinized library with internines, in which the mutagenized mini-Psp-Pol intein mPspMIL4 was inserted into P77853 at the S67 site; 3. A mutagenized library with internines, in which a mixture of mutagenized mini-Psp-Pol internins mPspM1L4, mPspM2L5, mPspM3L5, mPspM4L3, mPspM5L5, mPspM5L2 and mPspM7L3 was inserted in P77853 at site S67; 4. A mutinized library with inteins, in which mPspM5L5 mutagenized mini-Psp-Pol intein was inserted into P77853 at the S112 site; 5. A mutagenized library with integral cassettes, in which the cassette containing the Tth intein inserted in P77853 at the T134 site was mutagenized; 6. A mutinized library with internines, in which the mutated Tth was inserted into P77853 at the T134 site; 7. A mutinized library with internines, in which the mutated Tth was inserted into P77853 at the S158 site; and 8. A mutinized library with inteins, in which the 5 mPspM3L5 mutagenized mini-Psp-Pol intein was inserted in 030700 at sites S106, S215, S295, S314, S357 or S358.
Example 5 - Selection of libraries of enzymes modified with intein. Mutagenized libraries were selected and candidates were isolated, purified and confirmed. Individual libraries were titrated to measure the titer (plaque forming unit or ufp per pL) by serial dilution in SM buffer (SM buffer can be prepared by mixing 5.8 g of NaCI, 2.0 g of MgSO4 «7H2O , 50.0 mL of 1 M Tris-HCI (pH 7.5), 5.0 mL of 2% (w / v) gelatin in a final volume of one liter, and sterilized in an autoclave) and plated on NZY plates . For insertion sites, which were classified with a non-permissive phenotype, 15 such as mini-Psp-Pol mPspM1L4 intein at the S67 site and at the S112 site of P77853, or several sites at 030700, high density phage titers were used in selection. Up to 10,000 pfu were plated with 500 pL of XL1-Blue MRF 'cells (D06oo = 0.5) on a 15 cm plate. For libraries derived from enzymes modified with intein (for example, libraries prepared from the insertion of the Tth intein in the 20 T134 and S158 sites in P77853), which presented a switching phenotype, 2,000 pfu were selected per plate for the libraries.
Each library was plated on agar plates and incubated at 37 ° C overnight. Blue halo plates were marked representing mutations with a permissive phenotype. The plates were then sent 25 through a series of heat treatments (50 ° C for 2 hours and then 70 ° C for 2 hours) to induce the phenotypic expression of the plates with candidate phages. Individual plates were collected and spread in 500 pL of SM buffer. Serial dilutions, in SM buffer, were made and used to infect XL1-Blue MRF 'cells, which were then plated onto NZY plates. The plates were incubated 30 ° C overnight at 37 ° C and then at 70 ° C for 2 hours. Plaque phenotypes were confirmed following incubation at both temperatures.
More than 500 candidates for intein-modified P77853 xylanase were isolated, purified and confirmed for phenotype. Among them, about 100 include an insertion of mini-Psp-Pol intein at the S67 site, 70 include an insertion of M5L5 intein at the S112 site, 250 include an insertion of Tth intein at the T134 site, and 75 include an insertion of intein Tth at the S158 site. For xylanase 030700, about 50 collections were sent for plate purification, phenotype confirmation and PCR confirmation.
Candidates with confirmed phenotypes were excised individually in phagemid following the procedure described above. Most candidates were analyzed by enzyme assay. Candidates exhibiting temperature sensitive switching activity were analyzed by western blot assay (splice) 5 and by DNA sequence analysis.
Enzymatic assays for xylanase activity were conducted as follows: 1) Cultures were inoculated from a single colony, containing an excised phagemid, and grown overnight in 1 mL of Caldo de Luria (Caldo de Luria, LB, can be prepared by mixing 10 g of NaCI, 10 10 g of bacto-tryptone and 5 g of “yeast-yeast” extract in a final volume of one liter, then adjusting the pH to 7.0 using 5 N NaOH, and autoclave sterilization) supplemented with 100 mg / L ampicillin (AMP, obtained from Sigma) at 37 ° C and 300 rpm. 2) 50 pL of cells were transferred to 5 mL of Overnight Express ™ Instant TB medium (also called autoinduction medium, here, or AIM, and is available from Novagen) and grown overnight at 30 ° C and 250 rpm . 3) Cultures were centrifuged at 3000 rpm for 15 min. 4) The supernatant was removed and the cell pellets were resuspended in 200 pL of lysis buffer (the lysis buffer contains 1x FastBreak Lysis Buffer ™ (Promega), 200 mM sodium phosphate pH 6.5, and 0.2 pL DNase / mL). 5) The lysate was mixed thoroughly and a 1:10 dilution of the lysate was made in 200 mM sodium phosphate pH 6.5. E 6) 100 pL of each dilution were used for the activity tests, which were conducted on samples that were either exposed to conditions of splicing induction, such as a heat pretreatment, or not exposed to the conditions of induction.
For pretreatment (PT) tests, samples of 25 lysates were distributed in aliquots of equal volume, which were incubated at 37 ° C or 55 ° C for 4 hours, then cooled on ice. 20 pL of 0.2% finely ground AZCL substrate was then added and the samples were mixed well. The reactions were allowed to proceed at 37 ° C for at least an hour, but sometimes as long as overnight. Depending on the enzyme modified with intein, and its respective mature enzyme, reaction times, temperatures, conditions and substrates may vary.
For tests without pre-treatment (NPT), the samples were distributed in aliquots of equal volume and mixed with 20 pL of AZCL substrate finely ground to 0.2%. The reactions were allowed to proceed at 37 ° C and 70 ° C for 35 to 6 hours. Depending on the enzyme modified with intein, and its respective mature enzyme, reaction times, temperatures, conditions and substrates can vary.
In both pretreatment (PT) and non-pretreatment (NPT) tests, after the reaction time was complete, the samples were vortexed and then centrifuged at 4,000 rpm for 7 minutes. From each sample, 50 pL of supernatant was used to measure absorbance at 590 nm, which is an indication of how active an enzyme or enzyme 5 modified with intein was in the sample. Absorbance measurements were made either on a Thermo Scientific Spectrophotometer, or on a BioTek Synergy ™ Multi-mode microplate reader on 96- or 384-well rounded bottom assay plates. If necessary, the samples were centrifuged again to ensure that no cell fragments were collected, and dilutions of 5 x or 10 x in 10 200 mM sodium phosphate pH 6.5 were made when necessary.
Western blot analysis of enzymes modified with candidate intein mutants was conducted as follows: 1) 5 ml of an AIM culture, as grown overnight at 30 ° C and 250 rpm and then centrifuged at 3,000 rpm for 15 min . 2) The supernatant was removed and the pelleted cells were resuspended in 200 µl of lysis buffer (see above). 3) The lysate was mixed thoroughly and a 1:50 dilution was made using 1x phosphate buffered solution (PBS can be prepared by mixing 137 mmol of NaCI, 2.7 mmol of KCI, 4.3 mmol of Na2HPO4 and 1.47 mmol of KH2PO4 in a final volume of one liter, by adjusting the pH to 7.4 with 2N NaOH, and sterilizing by filtering the solution with a 0.22 micron filter), while the rest of the unused sample was stored at -20 ° C (higher dilution may be required, depending on expression levels and activities). 4) For each dilution, 50 pL of each dilution was transferred to a sterile or CPR centrifuge tube and heat treated at 37 ° C or 59 ° C for 4 hours (volume may vary depending on needs, but a minimum of 15-25 pL is recommended). 5) An equal volume of 2X loading buffer (2X loading buffer contains 62.5 mM Tris-CI pH 6.8, 6 M urea, 10% glycerol, 2% SDS, 0.0125 bromophenol blue % and 5% BME) was added; 6) A biotinylated ladder was prepared with an equal volume of urea (the ladder volume can be calculated by multiplying the number of gels to be used by 20 pL for an 18-well gel (Biorad), or by multiplying the number of gels to be used for 15 pL for a 26 well gel (Biorad)). 7) The samples were vortexed well and then loaded over the gel (for an 18 well Biorad gel, 30 pL samples were loaded and for a 26 well Biorad gel, 20 pL samples were loaded). 8) The gel was processed at 150 - 175 V for 1 hour and then disassembled. 10) The gel was soaked in 1x Transfer Buffer (Towbin) (25 mM Tris-base, 192 mM glycine, and 20% methanol) for 15 min. 11) A Whatman-PVDF sandwich (dipped in methanol) - gel - Whatman was assembled and the sample was transferred by e etroblotting at 15 V and in less than 600 mA for 1 hour. 12) The blot was removed and placed in a blocking solution containing 2% BSA in TBST (50 mM Tris-HCI, 150 mM NaCI, 0.1% Tween-20). 13) The blot was retained in the blocking solution overnight at 4 ° C. 14) Blocking solution was decanted and a primary antibody solution (1% BSA in TBST with primary antibody 1: 2,000, which recognizes the enzyme and the 5 enzyme modified with intein, being detected) was added. 15) The blot was washed with TBST 5 times for 5 minutes for each wash. 16) A secondary antibody solution (1% bovine serum albumin (BSA) in TBST with 1: 20,000 horseradish peroxidase antibiotin (HRP) and 1: 5,000 secondary anti-rabbit HRP) was added and the blot was added. washed with TBST, 5 times, for 5 minutes each. 17) The blot was immersed in 20 pL of SuperSignal® West Pico Chemiluminescent Substrate (Pierce) for 5 min and then developed on a G: Box ™ gel imaging system (Syngene) using successive snapshots, 20x at intervals 1 min under Chemi adjustment.
DNA sequencing was performed by routine methods in the art.
About 40 candidates from the library of (1) (intein mini-PSP-Pol mPspM1L4 in P77853 at site S67, mutagenized integral cassette) were generated and analyzed in western blot and by DNA sequencing. More than fifty percent of the sequenced candidates had a stop codon in C-extein, just 20 in or after the linker sequence between the substrate binding domain and the catalytic domain. Integral cassette mutagenesis tended to create a large number of candidates with truncated P77853 protein lacking the whole carbohydrate binding domain at the S67 insertion site. Although spliced mature xylanase was observed in a few candidates (m25, m30), only 25 more candidates showed the cleavage product (such as m3).
Intermin mutagenesis was more efficient in creating amino acid substitutions. With the conditions of mutagenic CPR tested, an average of 4 amino acid substitutions were observed in mini-Psp-Pol candidates, both at the S67 site and at the S112 site of P77853. These mutations led to the precursor cleavage, but not to the seam of intein in most mini-Psp-Pol candidates.
Tth intein produced temperature sensitive P77853 modified xylanases on the NZY agar plate, in an enzyme assay and in relation to the accumulation of spliced product in western blot. Based on this result, Tth intein modified xylanase candidates were further characterized.
In order to accurately measure the switching and temperature-sensitive splicing activity of a large number of candidates, the optimal switching conditions (temperature and time) of these candidates were determined. First, a few candidates were tested for the effect of inducing heat pretreatment conditions on xylanase activity. It was found that 55 ° C for 4 hours was the best from the series of temperatures tested (30 ° C, 37 ° C, 5 45 ° C, 55 ° C, 70 ° C) and times tested (0.5 hours , 1 hour, 2 hours, 3 hours, 4 hours, 6 hours and 20 hours). Several candidates were tested at a much lower temperature rise of around 55 ° C for 4 hours. The optimum temperature was found to be 59 ° C for all Tth candidates tested using these conditions.
FIGs. 3A to 3L illustrate western blot data for P77853 modified with Tth intein, where the intein is inserted into serine 158 (S158) or threonine 134 (T134) of the P77853 enzyme. The agar plate phenotype is denoted for each sample at the top of the track. Agar plate phenotypes are given as “SW” for a switch phenotype, TSP for a temperature sensitive switch splicer phenotype and P for a permissive phenotype.
FIG. 3A illustrates a western blot showing protein P77853-Tth-S 158-2 (SEQ ID NO: 1672), which showed a switch phenotype in the agar plate assay. FIG. 3B illustrates a western blot showing the P77853-Tth-S158-4 protein (SEQ ID NO: 1673), which also had a switch phenotype in the agar plate assay. FIG. 3C illustrates a western blot showing the protein P77853-Tth-S1158-7 (SEQ ID NO: 1674), which also had a switch phenotype in the agar plate assay. FIG. 3D illustrates a western blot showing the protein P77853- Tth-S158-19 (SEQ ID NO: 1675), which showed a temperature sensitive switch splicer phenotype. FIG. 3E illustrates a western blot showing the protein P77853-Tth-S158-20 (SEQ ID NO: 1676), which had a permissive phenotype in the agar plate assay. FIG. 3F illustrates a western blot showing the protein P77853-Tth-S158-21 (SEQ ID NO: 1677), which had a switch phenotype in the agar plate assay. FIG. 3G illustrates a western blot showing the P77853-Tth-S 158-25 protein (SEQ ID NO: 1678), which had a temperature sensitive switch splicer phenotype. FIG. 3H illustrates a western blot showing the protein P77853-Tth-S 158-38 (SEQ ID NO: 1679), which had a temperature sensitive switch splicer phenotype. FIG. 31 illustrates a western blot showing the protein P77853-Tth-S 158-39 (SEQ ID NO: 1680), which had a temperature sensitive switch splicer phenotype. FIG. 3J illustrates a western blot showing the protein P77853-Tth-S 158-42 (SEQ ID NO: 1681), which had a temperature sensitive switch splicer phenotype. FIG. 3K illustrates a western blot showing the protein P77853-Tth-S158-138 (SEQ ID NO: 1691), which had a temperature sensitive switch splicer phenotype.
FIG. 3L illustrates a western blot showing protein P77853-Tth-T134-1 (SEQ ID NO: 1629) (panel 1), protein P77853-Tth-T134-2 (SEQ ID NO: 1630) (panel 2), protein P77853 -Tth -T134-3 (SEQ ID NO: 1631) (panel 3), protein P77853-Tth-T134-9 (SEQ ID NO: 1632) (panel 9), protein P77853-Tth-T134- 91 (SEQ ID NO: 1644) (panel 91), protein P77853-Tth-T134-48 (SEQ ID NO: 38) (panel 48), protein P77853-Tth-T134-80 (SEQ ID NO: 1640) (panel 80 ) and protein 5 P77853-Tth-T134-95 (SEQ ID NO: 1645) (panel 95), which were previously treated by heating at 37 ° C (left lane in each of the previously mentioned panels) and 70 ° C (right lane on each of the panels mentioned above) for one hour. Also shown are the tracks containing the protein from the empty vector control (VCT) and the wild type protein P77853 (P77), 10 which has been previously heat treated in the same way. The phenotype of each protein is given above its corresponding clues.
Based on data from both the enzyme and western blot assays Figs 3A to 3L, incubation at temperatures between 55 ° C and 70 ° C for 4 hours increases the seam of intein in many of the X77-modified P77853 candidates modified with intein Tth.
T134 candidates with increased intein splice in western blot were tested in liquid assays using pretreatment at 37 ° C or 59 ° C (PT) for four hours, followed by a 12 hour reaction at 37 ° C with substrate. Alternatively, each had no pretreatment (TPN) and a reaction of 5 20 hours at either 37 ° C or 70 ° C was conducted. The results are tabulated in Table 3, below. The activity is quantified in an assay that measures the release of a dye from a marked substrate and is expressed in arbitrary absorbance units, measured in a spectrophotometer or plate reader at a wavelength of 590 nm. The percentage in parentheses in the 59 ° C column indicates the change in activity 25 from warp to PT at 59 ° C compared to PT at 37 ° C, which was calculated as the Warp Change = ([(activity after PT at 59 ° C) / (activity after PT at 37 ° C)] -1) X 100. ND means not determined. Table 3



P77853 xylanases modified with intein at an additional T134 insertion site developed even those in SEQ ID NOS; 1711 -1712.
Using the pretreatment assay (PT) described above, the switching profile was analyzed for temperature-induced reactivation of xylanase from over 300 Tth xylanase-modified X77 candidates in E.coli SOLR ™ cells (Stratagene ). Xylanase activity data were collected for all samples with duplicates and with or without heat pretreatment. For heat pretreatment, one set of samples was incubated at 37 ° C and the other at 59 ° C, both for 4 hours . After cooling the samples on ice, AZCL-xylan substrate was added and the mixture was left at 37 ° C for up to 12 hours. The AZCL-xylan substrate was added directly to two other sets of samples, without being previously heated, and reacted at 37 ° C for 5 hours. Results 5 for the Tth S158 P77853 xylanase are shown in Table 4, below. Although samples preheated to 59 ° C always had generally improved activity, almost a third of all Tth intein modified P77853 xylanase candidates demonstrated a difference (increase) of at least 2 times in activity in heat pretreatment between 37 ° C and 59 ° C. In other words, the activity measured at 37 ° C was often twice as high for samples that were pre-treated at 59 ° C than for samples that were pre-treated at 37 ° C. These candidates were subsequently analyzed in western blot. The activity is expressed in arbitrary absorbance units as measured in a plate reader at a wavelength of 590 nm. The percentage in parentheses in the 59 ° C 15 column indicates the change of activity to PT at 59 ° C compared to PT at 37 ° C, which was calculated as the Percentage Fold Change = (Qativity after PT at 59 ° C) / (activity after PT at 37 ° C)] - 1) X 100%. ND means not determined Table 4


P77853 xylanases modified with intein at an additional S158 insertion site have even developed those in SEQ ID NOS; 1700 - 1710.
A splice test over time was performed and the splice was checked in western blots for each of the P77853 candidate samples modified with intein or with the insertion in T134 or with the insertion in S158 in the tables above. FIG. 4A illustrates the seam test over time for sample S158 - 19. Protein extracts were incubated at 59 ° C for six hours, with samples taken at 0, 1, 2, 3, 4 and 6 hours, as indicated in FIG. 4A. The right side of FIG. 4A shows the empty expression vector control and the P77853 wild type positive control, as well as molecular weight standards. For the P77853 xylanase candidate modified with Tth S158-19 intein, which accumulated precursor protein 5 at a high level, a decrease in the level of enzyme precursor modified with intein is directly correlated with the accumulation of amended mature protein. This accumulation of spliced mature xylanase peaked in 4 hours, when the samples were heat treated at 59 ° C. As the incubation time increased, the amount of S 158-19 P77853 modified with NIC Tth intein decreased 10 while the amount of P77853 increased, indicative of increased intein splicing as time progressed during the 59 ° C incubation. Similarly, FIG. 4B illustrates a western blot analysis for P77853 xylanase modified with S158-30-103 Tth integin (SEQ ID NO: 1701). Protein samples were incubated at each temperature of 37 ° C, 50 ° C, 59 ° C or 65 ° C for different amounts of time (1, 2, 3, 4 and 6 hours), as indicated in FIG. 4B. The empty vector control samples and wild type P77853 are shown on the far right in conjunction with a molecular weight ladder. FIG. 4B shows that as time and temperature increase, there is an increase in the formation of mature P77853 (NC) enzyme, while there is a decrease in S158-30-103 x77anthose modified with 20 Tth integin (NIC). Likewise, FIG. 4C illustrates an analysis of western blotxylanase P77853 modified with T134-100-101 Tth intein (SEQ ID NO: 1711). Protein samples were incubated at each temperature 37 ° C, 50 ° C, 59 ° C or 65 ° C for different amounts of time (1, 2, 4, 6 and 17 hours). The empty vector control samples and wild type P77853 are shown on the far right together with a 25 molecular weight ladder. FIG. 4C shows that as time and temperature increase, there is an increase in the formation of wild type P77853 (NC), while there is a decrease in the amount of S158-30-103 xylanase modified with Tth intein (NIC), indicative of increase in intein seam. This figure shows that, as time and temperature increase, there is an increase in formation of 30 P77853, while there is a decrease in the amount of S158-30-103 modified with Tth intein, indicative of an increase in intein seam.
Unlike the activity-based pretreatment assay, which provides quantitative measurement of enzyme reactivation when thermal pretreatment, a western blot splicing assay offers the advantage of a visual seam demonstration. About 90 intein-modified enzyme candidates, which performed well in the pretreatment assay, were analyzed in western blot. For each individual candidate analyzed, a splice profile was stabilized. A splice profile consists of precursor level, precursor stability, level of mature spliced protein and level of cleavage product, each at two temperatures (usually selected from room temperature, 25 ° C, 37 ° C, 50 ° C, 55 ° C, 59 ° C, 65 ° C, or other temperatures, as desired). For some proteins modified with intein, samples were taken over time 5 during a thermal pretreatment and subjected to western blotting to investigate the splicing kinetics.
Mutations with respect to amino acids capable of enhancing the switching of intein and splice (DNA sequence data) have been identified for some enzymes modified with intein, as described below. These 10 mutations were specific for the enzyme modified with specific intein, as defined by a single target protein, a single intein and a single insertion site.
From the P77853 xylanase candidates modified with Tth intein, switching candidates and TSP candidates were subjected to DNA sequencing, together with P77853 modified xylanase candidates 15 with Tth intein that demonstrated splicing in the western blot analysis. in the Tth intein residues and in the P77853 residues, in the intein-extein junction, were identified, which are associated with intensified switching and splicing. For candidates generated from the insertion of the Tth intein in P77853 at the T134 site of P77853, a mutation of the Tth intein from P71 (amino acid 71 of the 20 integer Tth) to L, T or Q (SEQ ID NOS: 1928, 1929 and 1930) is associated with a TSP phenotype. A single insert in P136 (+3 portion of C-extein) was also associated with a TSP phenotype (SEQ ID NO: 1931). No combination of these mutations (P71 to L / T / Q, or insertion into P136) occurred in any of the TSP candidates that were sequenced. In the case of insertion in P136, there were 25 additional mutations, most visibly a substitution of S for V at the S135 site (position +2 of C-extein (SEQ ID NO: 1932)). These double mutants were also classified as belonging to the TSP family. The remaining candidates from 61 exhibited a switching phenotype, but temperature sensitive splicing was difficult to detect.
P77853 xylanases modified with intein, constructed by inserting intein in S158, were analyzed and different constructs of TSP were identified. Seventeen R51G (S) substitutions (amino acid 51 of the Tth intein) in the Tth intein (SEQ ID NO: 91) have been identified (SEQ ID NOS: 1675, 1678 - 1681, 1689, 1691, 1700 - 1708 and 1710) and all have been associated with TSP. Sequencing data 35 suggest that these intein mutations, which correlate with the TSP phenotype, play a role in the temperature-dependent splicing of P77853 xylanases modified with Tth intein, when inserted in these specific locations. Additional evidence to support the role of TSP elements in the splice comes from structural analysis of mutations on the surface of the intein. Both R51 and P71 from Tth are predicted to be in close proximity to the intein-extein junction and, therefore, to the active site for divination and splitting of intein. Summary of results from Examples 1-5.
One xylanase, P77853, was modified with an intein and analyzed as shown above. Multiple P77853 mutagenized intein libraries were created by inserting a mutagenized intein into the enzyme. Multiple mutagenized inteins and multiple intein insertion sites were used to create the library. Each modified enzyme in the library had a single mutagenized integin 10 inserted into a single insertion site. From about 10 million mutants in the library, 500 candidates were isolated. Candidates were analyzed by DNA sequencing, enzyme activity assay, temperature-sensitive activity and splice changes. It has been established that heat pretreatment at a temperature close to 60 ° C often induces switching; that is, changes in activity, of the enzyme modified with intein. In some candidates, switching is correlated with the intein seam. It has also been found that changes in particular amino acids in inteins and exteins, particularly near the intein-extein junction, are significant in enhancing the intein splice or temperature sensitivity. These amino acid changes are dependent on the specific intein, target enzyme and insertion site.
The insertion of the Tth intein in P77853, which does not contain an intine in its native sequence, resulted in TSP switching phenotypes, as described in the examples above. The T134 site of P77853 is located in the region of a β leaf and a loop region and the SVM classification punctuates this in the first 5 25 highest splicing sites in probability. In addition, the increased splice occurs with a mutation close to the insertion site to introduce a +2 proline, which correlates with a higher SVM classification. The insertion of the Tth integin at the S158 site of P77853, which is the seventh site closest to the residues of active sites (only 6.6 anstrons away) and which also occurs at the junction of a β leaf loop region, resulted in candidates modified with intein, which were capable of temperature-dependent splicing and both switching and TSP phenotypes.
Examples of intein-modified xylanases are provided in SEQ ID NOS: 1629-1712.
Example 6. Examples of intein-modified cellulases are provided in SEQ ID NOS: 1713-1784.
Example 7 - Cellulase tests and purification. Ace1 cellulase (E1 endoglycanase from Acidothermus cellulolyticus 11B) is an endoglycanase (EC 3.2.1.4) from Acidothermus cellulolyticus (Access to Genbank P54583). The enzyme has an N-terminal catalytic domain (CD), with homology to members of the glycosyl hydrolase 5 enzyme family, and a C-terminal cellulose binding domain with homology to the binding module protein family. to 5 carbohydrate 2 (CBM2). The CD and CBM2 domains in P54583 are joined by a linker domain rich in serine, threonine and proline. P54583 had been expressed from heterologous systems, including plants, and had been shown to effectively hydrolyze cellulosic material derived from plants.
P54583 expression and characterization. Referring to FIG. 5, 10 the plasmids pGAPZa and pAL410 are illustrated with cellulase inserts. Plasmids are not drawn to scale. In FIG. 5, the annotations have the following meaning: P-GAP, the nominally constitutive yeast GAP promoter; alpha, yeast alpha matching factor secretion signal, which translates as an N-terminal fusion to endoglycanase; P54583, coding sequence for Ace1 endoglycanase 15 (see below); AOXt, transcriptional terminator and polyadenylation signal derived from the yeast AOX gene; P-TEF1, promoter from the yeast TEF1 gene; P-EM-7, promoter derived from the yeast AM7 gene; zeo, coding sequence that confers resistance to zeocin in yeast and E. colr, CYC1t, transcriptional terminator and polyadenylation signal derived from the yeast CYC1 gene; ColEI, a region that allows replication of the plasmid in E. colr, f1 ori, sequence for generating single-stranded plasmid derivatives; KanMX, a gene that confers resistance to G418 in yeast; 2u ori, origin of 2 microns, allowing replication of plasmid in yeast cells; bla, a gene that confers resistance to ampicillin in bacterial cells. Note that P54583 is expressed with traditional 6His and myc C-terminal fusions from pGAPZα-P54583 and pAL410-P54583.
A codon-optimized version of P54583 has been prepared. The DNA sequence of P54583, as optimized for expression in plants, is shown below. Note: this sequence corresponds only to amino acid residues 42 to 562 of the native polypeptide in A. cellulolyticus, which corresponds to the "mature" form of endoglycanase and lacks the signal peptide (amino acid residues 1 to 41). The GCT codon following the startATG codon encodes amino acid 42.
codon optimized version of P54583 ATGGCTGGAGGAGGATACTGGCACACTTCCGGCAGGGAGATCCTCGACGCA AATAACGTTCCAGTCAGAATCGCCGGGATTAATTGGTTTGGCTTCGAAACGT GTAACTACGTGGTTCACGGCCTGTGGTCTCGGGATTACAGATCAATGCTCGA CCAGATCAAATCCTTGGGGTATAATACAATTAGGCTGCCCTACAGCGATGAC ATTCTTAAGCCTGGAACCATGCCGAACTCGATTAATTTCTACCAAATGAACCA GGATCTGCAGGGATTGACTTCTCTGCAGGTTATGGACAAGATCGTGGCGTAC GCCGGCCAAATCGGGCTCAGAATTATTTTGGATCGGCACAGGCCAGACTGCT CAGGTCAGTCGGCCCTGTGGTACACAAGCTCCGTGTCAGAGGCAACATGGAT TTCAGATCTTCAAGCCCTCGCACAACGCTATAAAGGCAACCCCACGGTTGTG GGATTCGACCTTCACAACGAACCTCACGATCCGGCCTGTTGGGGCTGCGGGG ACCCTTCGATCGACTGGAGACTGGCAGCGGAGAGGGCTGGTAACGCCGTTCT CAGCGTCAATCCCAACTTGCTGATCTTTGTGGAGGGAGTTCAGTCCTACAAC GGCGATTCTTACTGGTGGGGCGGAAATCTCCAAGGCGCAGGGCAGTATCCTG TCGTGCTTAACGTTCCGAATCGCCTGGTCTACTCAGCACACGACTACGCGAC TAGCGTGTACCCACAGACGTGGTTCTCCGATCCCACATTTCCTAACAATATGC CGGGAATCTGGAACAAGAATTGGGGTTACTTGTTTAACCAAAACATTGCTCC AGTTTGGTTGGGTGAATTTGGCACCACTCTTCAGTCGACGACAGACCAAACC TGGCTGAAAACCCTCGTCCAGTATTTGCGGCCAACTGCTCAGTACGGAGCAG ATTCTTTTC AATGGACGTTCTGGTCTTGGAATCCTGACTCCGGGGATACAGG CGGTATCCTGAAAGACGATTGGCAGACCGTGGACACTGTTAAGGACGGGTAC TTGGCGCCGATTAAAAGCTCGATCTTTGACCCAGTCGGCGCTAGCGCTTCCC CATCTTCACAACCTTCGCCGAGCGTCAGCCCCAGCCCAAGCCCAAGCCCGTC TGCCAGCAGAACCCCCACTCCCACACCTACCCCCACGGCCTCACCAACTCCG ACGCTCACTCCTACGGCGACGCCAACACCAACTGCTTCACCCACTCCTAGCC CCACCGCAGCGAGCGGGGCTAGGTGCACCGCTTCTTACCAGGTCAACTCTGA CTGGGGTAATGGCTTCACCGTGACTGTGGCGGTCACTAACTCAGGAAGCGTC GCGACGAAAACCTGGACTGTGTCCTGGACGTTCGGGGGCAACCAAACAATCA CCAACAGCTGGAACGCTGCAGTTACGCAGAATGGGCAAAGCGTCACGGCGC GCAATATGAGCTACAACAACGTGATTCAACCAGGCCAGAATACCACATTCGG TTTTCAAGCAAGCTATACCGGGTCAAACGCTGCCCCAACTGTCGCTTGTGCT GCCTCA (SEQ ID NO: 1933).
A DNA fragment carrying this sequence was linked to the integrative expression vector of Pichia pastorispGAPZα (Invitrogen, Carlsbad CA), 5 described above. pGAPZα is an integrative vector for the transformation of P. pastoris GS115. The resulting plasmid, pGAPZa-P54583 (FIG. 5), was then introduced into P. pastoris GS115 cells, according to the Invitrogen protocol.
Recombinants were selected based on zeocin resistance, and scored for their ability to mobilize the dye from AZCL-HE-cellulose (Megazyme International Ireland Ltd.) on agar plates.
Pichia strains expressing or P54583, an unrelated endoglycanase from Trichoderma reesei (P07981 of the 7-glycosyl hydrolases family), or albumin, were grown in rich media in the presence of zeocin. Supernatants were collected from these cultures and tested for endoglycanase activity using the Cellazyme C assay (see below), in which endoglycanases release blue dye (AZCL) from a cellulosic substrate (Megazyme International Ireland, Ltd.) . These assays demonstrated that Pichia clones expressing P54583 produced approximately twice as much endoglycanase activity as did clones expressing P07981. See FIG. 6. In FIG. 6, White is a sample containing uninoculated culture medium, and the activity is expressed 5 in cellulase units.
As mutagenesis could be more easily performed in S. cerevisiae, the coding sequence for P54583 was transferred from pGAPZa- P54583 to pAL410, producing the plasmid pAL410-P54583 (FIG. 5). pAL410 is an autonomously replicating vector for the transformation of S. cerevisiae. 10 strains of S. cerevisiae carrying plasmid pAL410-P54583 or negative control plasmid pAL410 were scored on YPD agar plates containing 100 mg / L of G418 and on which a layer of AZCL-HE-cellulose at 0, 2% (Megazyme) on 2% agar had been applied. Details of the plate activity test are provided below. As shown in FIG. 7, two independent transformants carrying pAL410-15 P54583 and two carrying pAL410 were scored over AZCL-HE-cellulose. The mobilization of the AZCL dye was clearly visible only in the vicinity of the clones that secreted active endoglycanase.
Measurement Activity of Endoglycanases and Intein Modified Derivatives:
Plate Activity Assays. Activity assay plates were prepared by applying a thin layer of liquid agar containing 0.2% AZCL-HE-cellulose substrate on selection plates with 100 mg / mL YPD G418. Once the plates had solidified, yeast cells containing genes of interest were plated on top of the substrate layer. The cells were then cultured at 30 ° C. Active endocellulase will immobilize AZCL dye and a blue halo will form in the surrounding environment. This is a qualitative test to assess activity from different strains and constructs over varying temperatures and time frames. This can also be tested to see activity on P54583 derivatives modified with intein.
Liquid Phase Activity Tests. Liquid tests allow for greater variation in test and sample preparation conditions and give quantifiable results by absorbance readings on a spectrophotometer or plate reader. Test conditions can vary over a wide range of pHs, temperatures, durations and sample preparations. Sample preparation for this assay may include varying growth conditions, concentration and purification methods and pretreatments. This assay can be modified to measure activity within culture supernatants or cell pellets.
Liquid assay on Cellzyme C (Megazyme) tablet substrate. Cellazyme tablets are pre-pelleted AZCL-HE-cellulose substrates (Megazyme International Ireland, Ltd.). This test gives results that correlate well with the plate test. A standard Cellazyme C tablet assay is conducted as follows. Mix a protein sample from liquid culture 5 with 25 mM NaOAc buffer pH 4.5 to a final volume of 500 pL. Equilibrate the samples at 42 ° C for 5 minutes. Add 1 tablet of Cellazyme C to each sample and incubate for 30 minutes at 42 ° C. To stop the reaction, add 1 mL of 20% tris base. Measure Abs5go on a light-bottom plate in a plate reader. Samples with more endocellulase activity will degrade the substrate more quickly, 10 causing Abs590 to increase. Using this assay, it was determined that P54583 activity is optimal around pH 5.0 and increases up to at least 70 ° C. Longer test time reaction will give increased absorbance readings (590 nm) (FIGS. 8 and 9). As shown in FIG. 8, P54583 has increased activity at pH 4.5 to pH 8.0. However, there is no significant activity 15 above that of the negative control at pH 2.0. As shown in FIG. 9, the Cellazyme C assay can be used to demonstrate that P54583 activity increases with increased temperature, and the signal intensity (absorbance at 590 nm) increases with time.
Liquid test with PNP-C. Activity from 20 endoglycanases, such as P54583, is also detectable with para-nitrophenyl-cellobioside (PNP-C) substrates. A standard PNP-C assay is a 50 pL reaction including 5 mM PNP-C substrate, active enzyme and a buffer to control the pH. This test can be performed under a wide range of pH, time and temperature conditions. To interrupt the reaction and to amplify the signal intensity, 25 pL of sodium carbonate pH 10.5 are added in a given time. The absorbance at 405 nm (Abs405) is measured in a spectrophotometric plate reading. An increase in activity will give a greater reading (FIG. 10). As shown in FIG. 10, a PNP-C assay of P54583 shows that the enzyme activity increases with the assay temperature.
Enzchek liquid assay (Invitrogen / Enzchek is a synthetic fluorometric substrate, which is also useful for endoglycanase activity assays. A standard assay involving Enzchek substrate is as follows. Mix equal volumes of substrate at room temperature with enzyme at room temperature , buffered at about pH 5.0, in plates with black wells (for example, black plates with 384 wells Corning # 3820) for reading fluorescence. Incubate at room temperature protected from light and measure fluorescence with wavelengths of excitation / emission 340 / 450. Fluorescence readings increase over time and with more concentrated samples Readings can be taken without stopping the reaction as early as 5 minutes after the start of the test or after several hours of incubation for samples with low activity levels. Stopping a reaction makes it possible to read after the same incubation time, which is useful when processing hundreds or thousands of samples. To stop the reaction, add an equal volume of 20% Tris base. This causes an immediate increase in the fluorescent reading, which appears consistent across all samples, and is stable for several hours. This activity test is sensitive, reproducible and can be used for high productivity tests in a liquid handler. Standard liquid handler conditions can be adjusted as 10 pL reactions using total culture, on Corning # 3820 plates.
Selection of yeast host to express intein modified endoglycanases. To test whether alternative yeast hosts could be more appropriate for i) mutagenesis and ii) selection of clones expressing intine-modified endoglycanases, the capacities of two yeast strains (INVSc-1 (Invitrogen, Carlsbad CA) and SCBJ (aka BJ5465, American Type Culture Collection, Manassas VA, No. Cat. 20829)) to assimilate exogenous DNA. Plasmid DNA samples, either as supercoiled DNA or as linearized DNA, were prepared and these DNAs were used to transform samples from each cell type with the Zymo Research EZ yeast transformation kit. Table 5 below shows the relative transformation efficiency of two strains of S. cerevisiae. As shown, the transformation efficiency was 100 times higher with SCBJ than with INVSd. SCBJ forms observable colonies earlier than does the INVSc-1 cells. Table 5

Descending concentration and purification of endoglycanases expressed from yeasts. Common among many endoglycanases, P54583 has a C-terminal carbohydrate binding domain, which ties the enzyme to its crystalline substrate. Based on this characteristic, methods were tested to knock down, and partially purify, endoglycanase with a carbohydrate analogue. Six equal aliquots were collected from culture supernatants, either expressing P54583 or carrying the negative negative control vector (pAL410, FIG. 5). Avicel ™ (microcrystalline cellulose) was added to five aliquots (all but one aliquot, which was saved as the untreated sample). Then, all aliquots were stirred at room temperature for one hour. After incubation, Avicel was pelleted and the supernatant was discarded. Four pellets were washed with elution buffers, as indicated in FIG. 11.0 eluate was immediately transferred to clean tubes and brought to neutral pH. The fifth Avicel pellet did not receive an elution wash. The activity of all six aliquots was then measured with Cellazyme C tablets. As shown in FIG. 11, microcrystalline cellulose can be used to separate active cellulase out of culture samples. This is a simple, inexpensive and fast method for purifying proteins and concentrating supernatants and cell lysates. Enzyme can then be analyzed by western blot analysis or the activity can be assayed directly from Avicel ™ or eluted to a lesser extent with a variety of buffers.
Immunological assays. P54583 can be detected directly through immunological assays, such as western blots. FIG. 12 illustrates 15 the results of a western blot. To conduct the assay, proteins were derived either from culture supernatants or cell pellet lysates, then deglycosylated before electrophoresis. This assay shows that most of the detectable protein resides in the culture supernatant, which suggests that an affinity purification based on the enzyme's antibody could be useful for protein concentration and purification.
Example 8 - Modification with P54583 endoglycanase intein. P54583 intein insertion sites were identified by the method presented in the detailed description. FIG. 13 depicts the relative positions of the sites selected in P54583 for the insertion of the Tth integin. The relative positions of the catalytic domain (GH5), the linker domain (narrow bar) and the carbohydrate binding module (CBM2) are shown. Two catalytic glutamates are conserved among members of the GH5 family. The numbering of the serine, threonine and cysteine residues shown are all in relation to the “mature” form of the polypeptide, as it would be secreted from S. cerevisiae following the alpha signal peptide cleavage, except 30 for C75 and C465, which they are actually in position 35 and 425 in relation to the cleavage site.
Sequences encoding the recombinant P54583 proteins were then assembled using a SOE CPR strategy (Horton RM, Hunt HD, Ho SN, Pullen JK, Pease LR. 1989. Engineering hybrid genes without the 35 use of restriction enzymes: gene splicing by overlap extension Gene 77 (I): 61-8), which is incorporated here, in its entirety, as if completely presented) as depicted in FIG. 14. This strategy is similar to the one used above, in the assembly of xylanase genes modified with intein. The primers were designed to anneal to: (A) the sequence encoding the alpha signal peptide in pAL410-P54583 (see Fig 5.); (B) a region within the coding sequence for P54583, 5 that is adjacent to the insertion site; (C) the 5 'end of the coding sequence for the Tth integin; (D) the 3 'end of the coding sequence for the Tth integin; (E) a region within the coding sequence for P54583, which is adjacent to the insertion site (note that this non-A site overlaps that covered by the C primer), and (F) a region within the CYC terminator sequence from pAL410 P54583.
RCP1 employed primers A and B to assemble a short product, which includes the coding sequences for a portion of the alpha signal factor, as well as the N-terminal portion of the endoglycanase (P54583-N). The extreme 3 'end of the PCR product 1 includes a short segment, which is homologous to the extreme 5' end of the Tth integin. RCP2 employs primers C and D to amplify the coding sequence of the Tth integin. RCP3 employs primers E and F to amplify the coding sequences for the C-terminal portion of the endoglycanase (P54583-C, which may include all or a portion of the catalytic domain, as well as the carbohydrate binding module), along with the amino acid "C + 1", a short segment, which is homologous to the extreme 5 'end of the Tth integin, and a portion of the terminator CYCL £ 25 (CYC1t) from pAL410 P54583. CPR products 1, 2, and 3 were then combined into a single CPR reaction. Due to their homology to the ends of the Tth intein, PCR products 1 and 3 will ring the PCR product 2. DNA synthesis and amplification with the outermost primers (A and F) will lead to product assembly in full length, as indicated at the bottom of the diagram. Often, it refers to the final product simply as an “NIC” (N-terminal fragment, an intein and a C-terminal fragment). This method can be used to construct a protein modified with intein of any type, at any insertion site, by choosing the appropriate primers. And the insertion of intein into the site can be selected like any amino acid in the protein, using a natural nucleophilic amino acid at position zero or by mutating the amino acid at position zero to be a nucleophile amino acid. The nucleophilic amino acid can be a C, T or S residue.
Typical cycling conditions for SOE CPR involved reactions of 20 pL, with 10 pL of Phusion HF (New England Biolabs, Ipswich MA) DNA polymerase Master Mix, 4 pL of each primer (from a stock concentration of 1 pM) and 2 pL of the appropriate mold, diluted to about 0.1-1 ng / pL. Thermal cycling was performed as recommended by the manufacturers of Phusion HF DNA 5 polymerase. After the initial round of PCR reactions, the products were gel purified using Wizard SV Gelede PCR Cleanup Kit (Promega, Madison, Wl), and 1 pL from each product in the first round was mixed to assemble the product from the second round (full length), in a subsequent CPR reaction, with conditions virtually identical to the first round, except that the extension times were increased from 30 s to as much as 60 s.
To prepare any desired intein modified P54583 derivative, CPR products can be prepared, which are tailor-made for each intein insertion position. However, some of the components of this experimental arrangement are modular. For example, primers C and D can be used to prepare CPR 2 product, which can then be used to assemble any of the planned recombinants. Similarly, primers A and F can be used to prepare CPR products 1 and 3, respectively, regardless of the insertion position. As such, only primers B and E are unique for a given intein insertion event. Table 6 below lists the sequences (in 5'-3 'orientation) of the oligonucleotide primers that were used to assemble each of the intein modified endoglycanases. While primers B and E are unique to each product, each contains a region that is homologous to the Tth integin terminal, as shown in the discussion of FIG. 14. This region is underlined in each primer sequence in Table 6. Table 6



The insertion of the Tth intein at the C75 position was accompanied by a small number of conservative amino acid changes close to the intein / extein junctions. To accommodate these changes, Tth (RCP2), which was used to assemble the C75Tth product, was amplified with alternate forms of 5 C and D primers as follows: Cc75ith, 5 'TGCCTTGCCGAGGGTACCCGAGTCTTGGACGCGGCTACCGGGCA 3' (SEQ ID NO: 1968) DC76Tth, 5 'GTTGTGCACGACAACCCCTTCGCTCACGAAGTTTGCAAAGGGT 3' (SEQ ID NO: 1969)
The insertion sites listed in Table 2 are the same as those depicted in FIG. 13. A series of primers was also designed to insert the PspPol and RecA integers at various positions within P54583. Strategy 10 for the insertion of these inteins is identical to that described with reference to FIG. 14, except that the primer sequences B, C, D, and E are all tailored to the specific intein. The compositions of these primers are shown in Table 7 (primers used to assemble products that encode P54583 endoglycanases modified with PspPol intein) and in Table 8 (primers used to assemble products that encode P54583 modified endoglycanases with RecA) below. Table 7

Table 8


Using the above primers, RCP SOE reactions were performed for all the intin-modified endoglycanases that were designed. Full-length CPR products were then ligated into pCRBIunt II TOPO (Jnvitrogen) and the individual clones were fully sequenced to ensure that no unintended base changes had occurred during CPR and / or cloning. In cases where mutations were found, all or part of the affected CPR reactions were repeated and errors were corrected. Once the composition of a product, which encodes an intein-modified P54583 was confirmed, the entire fragment was excised from the pCRBIunt II vector and ligated into 10 pAL410 (or a related vector). The resulting vectors were subsequently introduced into yeast cells. Yeast transformants were typically checked by a combination of colony PCR and plasmid recovery via miniprep (using reagents from the ZymoPrep Yeast Miniprep Kit II, Zymo Research, Orange CA). The plasmids recovered from yeast cells 15 were then reintroduced into E. coli cells, propagated, isolated by means of E. coli miniprep plasmids and examined by restriction enzyme digestion, to determine whether the plasmids had undergone any mutations or rearrangements since their introduction into the original yeast cells. When fully verified plasmids were recovered in this way, the corresponding yeast strain 20 would be used in subsequent experiments involving the intein-modified endoglycanase.
Transformants of S. cerevisiae, carrying expression vectors for endoglycanases modified with intein, were then punctuated over parallel YPD plates (A and B) containing 100 mg / L of G418, over which 0 25 a layer of AZCL-HE-0.2% cellulose had been applied. These plates were incubated for 2 nights at 30 ° C. Then, plate B was transferred to 70 ° C for several hours. FIG. 15 shows plates A and B and, in the respective order, ranges 1-21 are P54583 T154Tth, P54583 SI35Tth, P54583 SI34Tth, P54583 S96Tth, P54583 S94Tth, P54583 T93Tth, P54583 C75Tth, P54583 S67Tth, P54583 P54583 SIOTth, P54583-Wild Type, empty vector pAL410, P54583 S393Tth, P54583 S353Tth, P54583 S330Tth, P54583 S321Tth, P54583 S314Tth, P54583 S277Tth, P54583 S237Tth and P54583 17: 17 1759, 1760, 1739, 1761, 111, 2006, 1762-1767, 1743 and 1742, respectively. Blue halos appear around some of the cells, indicating the presence of P54583 activity. The results of this experiment suggested that the insertion of the Tth intein disturbs P54583 to varying degrees, depending on the insertion site, and that one or more of these intin modified endoglycanases exhibits temperature-sensitive enzyme activity.
The insertion of the Tth intein into wild type P54583 has an effect on enzyme expression and activity levels, which can be measured by western analysis and activity assays. An activity test with Enzchek was performed on 20 PIC4583 NICs with controls. The 20 NICs had the Tth integer inserted in positions S10, S56, T61, S67, (C75), T93, S94, (S96), S134, (S135), T154, S192, S237, S290, S314, S321, S353 and (S393). These 20 NICs have the sequence of (SEQ ID: 1761, 1739, 1760, 1759, 1741, 1758, 1757, 1756, 1755, 1754, 1753, 1742, 1743, 1768, 1766, 1765, 1763 and 1762). The culture supernatant was aliquoted. Half of these aliquots were subjected to heat pretreatment at 52.5 ° C for 6 hours, while the other half was stored at 4 ° C. The temperature and duration of the pre-treatment may vary. These samples were then A equilibrated to room temperature and subjected to an assay with Enzchek (incubation time of 3 hours with the substrate). At the end of the test, endoglycanase activity was inferred from the amount of fluorescence in each sample. As shown in FIG. 16, the Enzchek activity assay revealed that a subset of the endoglycanases modified with intein produces enzyme activity above the base level (pAL410, empty vector control), and that a part of these exhibits higher activity even when pre-incubated at 52 , 5 ° C. In FIG. 16, “wt” means wild-type endoglycanase P54583. Due to the difference in number used between constructs (reflecting either the immature form or the mature form of P54583 lacking its signal peptide), the position of the insertion site amino acid, in relation to the immature form, is presented in parentheses for a subset of the NICs.
The constructs described in FIG. 16 contain a His marker at the carboxyl end, which can be detected by a 25 His marker antibody. Supernatants from corresponding cultures were concentrated 20 times, and used in western blot assays (FIG. 17). In FIG. 17, “wt” indicates wild type P54583, pAL410 indicates the empty vector with His antibody (GenScrípt, Piscataway NJ), and mature spliced protein appears as a 60 KDa band. An additional Tth modified P54583, C465 (SEQ ID NO: 1769) was also tested by western blot, as shown in FIG. 17. Lanes marked with an asterisk also show significant activity in plate testing (see FIG. 15). Western blots have shown that proteins with molecular weights similar to that of the wild type enzyme could be detected in cultures expressing enzymes modified with intein, suggesting that the seam of intein is occurring in the recombinant proteins. Higher molecular weight species could also be detected in several samples, which can correspond to unmended NICs, splice intermediates, aggregates or other forms of recombinant proteins. The NICs showed varying levels of protein accumulation, which corresponds to the activity measurements shown in FIG. 16, to some extent. Example 9 - Mutagenesis of endoglycanases modified with intein
Homologous recombination has been used to generate enormous diversity among the DNA libraries in S. cerevisiae (Swers JS, Kellogg BA, Wittrup KD. 2004, Shuffled antibody libraries created by in vivo homologous recombination and yeast surface display, Nucleic Acids Res 32: e36, which is incorporated by reference, in its entirety, as if it were completely presented). In this system, linear DNAs, carrying the coding sequences 10 for polypeptides that had been generated, can be inserted into expression vectors linearized by yeast cotransformation. Error-prone CPR or other strategies can be used to mutagenize an entire endonuclease modified with intein or parts of it (for example, intein). The resulting products can be co-transformed into S. cerevisiae cells, together with a suitable linearized expression vector (for example, pAL410 or a derivative thereof), which will catalyze homologous recombination between the molecules and give rise to collections of several thousand yeast clones, each carrying a unique recombinant expression vector. Yeast colonies, which originate from such an in vivo recombination protocol, can thus express a variety of modified proteins, the diversity of which is directly related to (or even greater than) the level at which the sequence coding has been mutagenized.
A series of recombination vectors has been developed for use in yeast in vivo recombination. Each of the recombination vectors carries a truncated version of the whole Tth. The truncated Tth inteins 25 lack most of the intein sequence, retaining only 70-80 bp from each of the 5 'and 3' ends of the intein coding sequence. At the center of this DNA sequence is a unique EcoRV recognition site. The truncated Tth DNA sequence is shown below, with the EcoRV site underlined.
TGCCTGGCCGAGGGCTCGCTCGTCTTGGACGCGGCTACCGGGCAGAGGGTC CCTATCGAAAAGGTGCGTCCGGGGATATCGAACCGGCCGGTAAGGCGAGAA CATTCGACTTGCGCGTTGACGTGG
Expression vectors that carry such a truncated integin can be easily linearized through digestion with EcoRV. Since such vectors lack most of the “wild-type” intein sequence, expression vectors that originate during homologous recombination in yeast are more likely to carry the mutations generated during error-prone CPR, since 35 there is less “wild-type” integin to complete with mutants during recombination. In addition, the use of this truncated integin in the recombination vector provides the additional benefit of decreasing the number of false positives that could originate due to the vector's self-ligation in a high productivity selection regime. Due to the nature of the truncation, the truncated integers introduce a frameshift in the endoglycanase gene, resulting in an enzyme, whose translation would be prematurely terminated. Such translation products are less likely to be enzymatically active. As such, functional enzymes, which originate during library selection, are more likely to result from true recombination events involving fragments of DNA that encode mutagenized integers.
Using a strategy similar to that described with reference to FIG. 14, expression vectors derived from A pAL410-P54583noHis were prepared. In these expression vectors, the truncated Tth integin sequence was introduced in place of the full-length integins in each of the S56, C75, S192 or S237 positions. This collection of recombination vectors was then used to generate libraries of endoglycanases modified with mutagenized intein in SCBJ yeast cells. Referring to FIGs. 18A-C, a PCR integin mutagenesis scheme is illustrated. Primers (for example, S237up and S237down) flanking the intein insertion site in the template expression vector (pAL410-P54583noHis S237Tth (FIG. 18A)), can be used to amplify a specific region of the recombinant vector, which contains the sequence of complete intein coding, as well as parts of the flanking extein coding sequences. Alternatively, primers that amplify only sequences of intein can be used. Under appropriate conditions, the CPR products were generated with random mutations spread among the collection of amplified DNA molecules (stars). These mutagenized DNA molecules can be mixed with an appropriate vector, as shown in FIG. 18C, which had been linearized through digestion with restriction endonuclease EcoRV. The mixture can then be introduced into yeast cells to drive recombination. In the example above, the DNA molecules depicted in (B) would be used to create a library of mutagenized integers at position S237 using linearized pAL410-P54583noHis S237Tth-trunc as the vector. Initiators, tailor-made for positions S56, C75 or S192, can also be used in conjunction with the respective recombination vectors depicted in FIG. 18C. Such a strategy allows the inclusion of DNA molecules that carry mutations in the regions that flank the extein (in this example, an endoglycanase), as well as within the intein. However, if CPR primers are used, which only amplify sequences of intein during error-prone CPR, then any of the recombination vectors can be used to host the altered intein coding sequences. In FIG. 18A, P54583-N and P54583-C refer to the coding sequences for the N and C-terminal portions of the endoglycanase. In FIG. 18B, P54583 * refers to the small flanking portions derived from the endoglycanase coding sequences, which can be included in the PCR product mutagenized with judiciously designed primers. In FIG. 18C, 5 TthN and TthC denote the N- and C-terminal portions of the Tth coding sequence that are separated by the EcoRV site in the truncated integin. Other abbreviations are as described with reference to FIG. 5. Example 10 - P54583 modified with mini-integers
Based on the initial 10-plate and liquid activity assays, a subset of the insertion sites described above was chosen to modify with an additional eight mini Tth integers, which are mTthOOl (SEQ ID NO: 92), mTTh002 (SEQ ID NO : 93), mTth003 (SEQ ID NO: 94), mTthOO4 (SEQ ID NO: 95), mTthOOõ (SEQ ID NO: 96), mTth007 (SEQ ID NO: 98), mTthOOβ (SEQ ID NO: 99) and mTthOlO (SEQ ID NO: 101). An intein was inserted by construct. The position S56 in 15 P54583 was the initial site chosen for modification with mini-inteins. In a single in vivo recombination reaction in yeast, the mini-Tth integers were inserted in this position. Following the recovery and cultivation of yeast in YPD G418 plates, 36 separate colonies were grown for activity testing. Two of the 36 expressed activity above the baseline level. Plasmids were recovered from 20 of these two strains and subjected to DNA sequence analysis. It was found that both samples carry the mTthOlO mini-intein. The DNA sequence of the mini-intein MTthOlO intein is shown below, with the underlying corresponding amino acid sequence: mTthOlO tgcatggccgagggctcgctcgtcttggacgcggctaccgggcagagggtccctatcgaa CLAEGSLVLDAATGQRVPI and aaggtgcgtccggggatggaagttttctccttgggacctgattacagactgtatcgggtg KVRPGMEVFSLGPDYRLYRV cccgttttggaggtccttgagagcggggttagggaagttgtgcgcctcagaactcggtca PVLEVLESGVREVVRLRTRS gggagaacgctggtgttgacaccagatcacccgcttttgacccccgaaggttggaaacct GRTLVLTPDHPLLTPEGWKP ctttgtgacctcccgcttggaactccaattgcagtcagagatgttgagactggagaggtt LCDLPLGTPIAVRDVETGEV ctctgggaccctattgttgctgtcgaaccggccggtaaggcgagaacattcgacttgcgc LWDPIVAVEPAGKARTFDLR gttccaccctttgcaaacttcgtgagcgaggacctggtggtgcataac (SEQ ID NO: 2008) VPPFANFVSEDLVVHN (SEQ ID NO: 101) To test whether the endoglycanase activity of the P54583 derivative carrying this mini-intein (aka “P54583 S56mTth010”) was dependent on the ability of the mini-intein to mend, a modified version of the construct was prepared. In the modified version, the terminal amino acids of the intein (the N-terminal cysteine residue, and the C-terminal asparagine residue; see the sequence above) were replaced by alanines. N-terminal cysteine and C-terminal asparagine are likely to play crucial roles in the catalysis of intein splicing, and substitutions of these residues for alanine are either known or likely to prevent the seam of intein. Referring to FIG.19, the samples were removed from SCBJ yeast cultures that carry either the empty expression vector, pAL410 (negative control), an expression vector encoding the uninterrupted enzyme, P54583 (wt), an expression vector encoding a derivative carrying the mini-intein at position S56, P54583 S56Tth139, or an expression vector encoding a derivative carrying the disabled mini-intein at position S56, P54583 S56AThA139. The samples were tested for endoglycanase activity through a four hour incubation at room temperature in the Enzchek assay. In contrast to mini-intein, disabled intein reduces endoglycanase activity almost to the level of negative control. This trend was consistent, regardless of whether the samples had been pre-incubated at low (4 ° C) or high (55 ° C) temperatures, for 6 hours before testing. From this, it was concluded that the inability to amend a mini-intein at the S56 position of P54583 will disrupt the enzyme activity, while a competent mini-intein for splicing in the same position will allow the reconstitution of much of the enzyme activity native. To investigate whether the mini-intein in that position showed temperature sensitive splicing, that is, whether the pre-incubation of the recombinant enzyme at particular temperatures reconstituted different amounts of endoglycanase activity, samples from a single SCBJ yeast cell culture, expressing P54583 S56MTth010 (aka P54583 S56Tth139) were pre-incubated six hours at various temperatures. After that period, the samples were cooled uniformly to 4 ° C and then subjected to the assay with standard Enzchek (incubation at room temperature with substrate). Referring to FIG. 20, pre-incubation temperatures as high as 46.6 ° C did not reconstitute the activity any more than pre-incubation at 4 ° C did. However, pre-incubation of the enzyme for 6 hours at 50.8 53.6 ° C led to a modest increase in enzyme activity. At higher temperatures, endoglycanase activity appeared to fall below levels reached by enzymes that had not been heated above 4 ° C. At least in part, this apparent decrease in activity may be due to the loss of a base level “endoglycanase-like” activity, which can be detected in yeast culture supernatants. Baseline activity is heat-labile at such high temperatures. When total endoglycanase activity is slow (as in this particular experiment), the effect of this baseline activity can be significant. To some extent, the effect of this phenomenon can be seen in the data depicted in FIG. 19, where the endoglycanase “activity” of the negative control sample (pAL410) appears to decrease when the culture is pre-incubated at 55 ° C before the assay. FIG. 20 shows that a temperature between 50.8 ° C and 53, 6 ° C leads to the reconstitution of the largest amount of activity from this recombinant enzyme.
Eight mini-integins were introduced at position S237 of P54583. The eight mini-inteins presented the sequence of SEQ ID NOS: 2009 - 2016, respectively. An integin was inserted by construct. The mini-inteins were introduced at the S237 position via in vivo recombination. Colonies of recombinant yeast candidates were recovered in each case, and the plasmids that each 10 carried were isolated and tested by DNA sequencing, to confirm that the gene responsible for the internin-modified endoglycanase was intact, and lacking point mutations or other changes. Once a yeast strain had been identified for each of the mini-intein modified endoglycanases, the entire set was subjected to endoglycanase assays. Strains carrying 15 to the mTthOlO mini-intein showed clear endoglycanase activity. As shown in FIG. 21, this endoglycanase modified with intein also showed an optimal induction temperature close to 52.5 ° C. Pre-incubation of the enzyme for 6 hours at 50.8-53.6 ° C led to an increase in enzyme activity of approximately 75%. The tests were carried out at room temperature for 120 hours using the Enzchek substrate. Additional P54583-mTth010-S237 intein modified proteins, which have been isolated and showed improved activity levels, are given as SEQ ID NOS: 1751, 1752.
Having shown that the activity of mTthOlO could be recovered from the endoglycanase modified with P54583 S237MTth010 intein by 25 pre-incubation at about 52.5 ° C, it was then tested whether the length of this pre-incubation step influenced the enzyme activity. Four colonies, separated from a SCBJ culture (pAL410 P54583noHis S237Tth139) were grown independently, in a rich medium. The aliquots were sampled from each culture, divided into multiple samples and each division sample was pre-incubated 30 for different periods of time at 52.5 ° C as follows: 0 hours (not heated, pre-incubated only at 4 ° C), 2 hours, 4 hours, 6 hours, 8 hours or 10 hours. Following the pre-incubation step, the individual split samples were stored at 4 ° C until assays were performed. Each split sample was then tested via the Enzchek assay at room temperature. As shown in FIG. 22, three of the 35 four cultures reached their highest level of activation within 2 - 4 hours. Longer pre-incubation times either did not improve the activation of the enzyme or caused a decrease in the amount of activity recovered. Example 11 - Mutagenesis and selection of endoglycanases modified with intein.
Using the strategies outlined with reference to FIG. 18, error-prone PCR was used to create collections (libraries) of mutants carrying base pair changes in the DNA encoding the integers and adjacent portions of the endoglycanase. Libraries have been prepared, which are derived from both full-length inteins and mini-inteins at each of the various positions in P54583, including positions S56, C75, S192 and S237. Yeast clones from each library were collected for preliminary analysis. Colony PCR (using KAPA2G Robust Taq, from KAPA Biosystems, Waltham, MA) was used to amplify the portion of the gene encoding endoglycanase, which included the mutagenized integin in each case. These CPR products were then subjected to DNA sequencing to assess the frequency and nature of the mutations in the library.
Following the initial assessment of mutation frequencies, 15 clones from an individual library were spread over selective media (YPD agar supplemented with 100 mg / L G418) and cultured at 30 ° C for 2 - 3 days. 3760 colonies were collected from these plates, along with numerous positive [SCBJ (pAL410 P54583noHis)] and negative [(SCBJ (pAL410)] controls, and inoculated in 1 ml volumes of liquid YPD medium supplemented with 100 mg / L of 20 G418, which had 96 well deep well plates distributed, which were then incubated for 3 days with vigorous shaking at 30 ° C. Aliquots were then removed from each of the liquid cultures, divided into samples replicated and subjected to the Enzchek assay. For each culture, a portion of the replicated samples was pre-incubated at 52.5 ° C for 4 hours, while the remainder was incubated at room temperature, after which all replicated samples were equilibrated to room temperature, and divided into triplicate samples before mixing with Enzchek substrate. After 90 minutes, the endoglycanase reaction was stopped by adding an equal volume of 20% Tris base and fluorine units Total absences were measured. The degree of activation of the heat-sensitive enzyme was inferred from the difference in activity measured from the heated and unheated treatments for each sample. The difference in activity that each clone exhibited through the two pre-treatment conditions was then calculated as a fold induction where 1-fold denoted no change in activity. Degrees of increase (or decrease) sensitive to heat and enzyme activity were then saved, and the number of 35 clones falling into each category launched in the histogram of FIG. 23. As shown in FIG. 23, the diversity of behaviors (temperature sensitivity) between clones in the library is centered on the behavior of the parental clone, in this case, the endoglycanase P54583 carrying the mini-integin MTthOW in position S56, which also exhibited an increase in activity of ~ 10 % (i.e., 1.1-fold induction, cf. FIG. 23) when pre-incubated at about 52.5 ° C.
Scoring the degree of temperature sensitivity among these approximately 4,000 clones allowed the identification of candidates for further analysis 5. Clones from a library designated “Library 14” (Libl4, SCBJ cells bearing pAL410 derivatives P54583 S56Tth139) were analyzed. Clones, which showed the greatest difference in activity in the experiment described in reference to FIG. 23, were subsequently analyzed, and a portion of the data is shown in the graph of FIG. 24. Selected clones include the mutant-modified enzymes indicated in Table 9, below. Activity from samples treated at room temperature is indicated by the left bar for each mutant, and activity from heat treated samples is indicated by the right bar for each mutant in FIG. 24. The error bars in FIG. 24 reflect the differences in activity between the triplicate assays. In these assays, the positive controls for wild type P54583 and the negative controls for pAL410 typically exhibited modest decreases in activity following preincubation at elevated temperature. As such, none of these control samples appear in FIG. 24 among 40 clones exhibiting the greatest increase in activity. Table 9

Individual clones were collected from the above set and purified colony. Fresh cultures (in YPD G418) were grown from 3 unique colonies derived from each clone, and these cultures were subjected to the assay with Enzchek for temperature sensitive endoglycanase activity, constituting a second assay of the above candidates. Subsequently, a colony from one of the 3 unique colonies, which had been used for the second assay, was used to inoculate 3 separate 1 ml volumes of YPD G418, grown at 30 ° C, and tested using the Enzchek assay, constituting a third essay of the above candidates. In each case, the increase in activity times was calculated, making it possible to determine the reproducibility of the performance of each clone. Such a comparison is shown in FIG. 25, for six of the clones collected from this library. In FIG. 25, Trial 1 refers to the initial result with each clone from the selection of high productivity. The data from this assay correspond to a single culture, from which 6 technical replicates (3 preheated, 3 without heating) were generated and tested. The data from Test 2 reflects 3 biological samples (single cultures derived from 3 separate colonies), from which samples were prepared in duplicate (one preheated, one unheated), which were then each one, in two technical replicates before the test. Assay 3 reflects results with cultures derived from single colonies purified from the initial cultures that had been examined during Assay 1. In Assay 3, results are averages from a minimum of 12 assays (6 pre-incubated at room temperature and 6 pre-incubated at 52.5 ° C), with each set of 6 corresponding to two technical replicates for each of a minimum of three biological replicates. These results suggest that the initial selection may slightly exaggerate the degree of change in activity that can be recovered from a given clone, although each of the candidates shown in FIG. 25 showed induction of> 1.5 times in subsequent trials.
Portions of the DNA sequences, which encode the intein-modified endoglycanases, were isolated by colony PCR from several of the candidates identified in the original selection of Library 14. An examination of the sequences of the regions that encode intein from each clone presented in FIG. 25, showed that each carried a mutation that caused at least one amino acid change within the MTthOlO mini intein sequence, and one of the clones also showed a mutation that resulted in an amino acid change in the adjacent N-extein sequence. These mutations are listed in Table 10, below. Table 10
Numbering relative to that of MTthOlO t Numbering relative to the mature form of endoglycanase P54583
In the examples summarized in Table 10, only the regions in the immediate vicinity of the integin were sequenced. However, it is interesting to note that two independent clones were recovered with the same mutation in the intein (R55C in both Lib14 AA0057.F3 and Lib14 AA0057.D5).
Additional libraries were built, in which a single amino acid within the full-length Tth integin was targeted for saturation mutagenesis. Previous results with intein mutagenesis in a xylanase (SwissProtP77853 accession number) revealed that mutations, which affected arginine 51 of the intein when Tth was inserted into certain positions of P77853, gave 5 intein-modified xylanase a highly temperature sensitive switching phenotype . In order to test whether a similar mutation could cause temperature-sensitive behavior in intein-modified endoglycanases, random mutations were introduced at the R51 position of the Tth intein, and the integins were carried in each of the positions S56, C75, S192 and S237 of P54583 . Libraries of 10 yeast clones, which express intein-modified endoglycanases with these mutations were then selected with the same high-throughput Enzchek assay described above. The data was classified to identify those clones that expressed enzymes with the strongest temperature sensitive induction. As shown in FIG. 26, the candidates originating from this selection exhibited modest induction (1.5 15 to 2 times) induction of activity when pretreatment. Most of the best performances were derived from those clones that carried the internines in either of the S192 or S56 positions of P54583. Example 12 - Termite endoglycanases
An endoglycanase from Nasutitermes takasagoensis 20 has been modified with an intein, such that the intein compromises endoglycanase activity, and excision of the intein (either spontaneously or in response to a stimulus, such as temperature change) reconstitutes the activity of the endoglycanase. The endoglycanase modified with intein can be used in applications that require conditional hydrolysis of cellulosic materials and / or other polysaccharides, which Q 25 can be recognized as substrates by endoglycanase. Termite-derived endoglycanase may have an advantageous pH tolerance, expression and / or higher specific activity compared to other endoglycanases. For example, a pH-inducible integin can be inserted into endoglycanase.
Termites naturally metabolize a variety of 30 lignocellulosic materials, due to their unique anatomy, physiology and symbiotic microflora. Because termites consume lignocellulosic materials, they mix particulate matter with a variety of enzymes. Passing through the termite gut, the materials encounter pH changes ranging from slightly acidic to strongly basic. The particles are then assimilated by symbionts, which populate the intestine of 35 termites and are subsequently metabolized. Exchanges of organic metabolites between symbionts and termites provide a means by which termites obtain indirect nutritional benefit from the ingested materials.
Not all digestive enzymes responsible for the degradation of lignocellulosic materials in termites are of microbial origin. Some of the most active enzymes in the termite system are actually expressed and secreted by the termites themselves and subsequently assimilated by the symbionts together with the particulate materials. In some species of termites, such as 5 Reticulitermes speratus or Mastotermes darwiniensis, endoglycanases are secreted from the salivary glands and are mixed with the wood material during chewing, after which they pass into the intestine and are then assimilated by symbionts. In other species, such as Nasutitermes takasagoensis, enzymes are secreted directly into the midgut. FIG. 27 shows the phylogeny of termite endoglycanases.
Comparison of amino acid sequences of the catalytic domains of a variety A of glycosyl hydrolase 9 endoglycanases (GH9) reveals considerable similarity between termite-derived enzymes (Nasutitermes, Reticulitermes), microbes and plants. As shown, endoglycanases (EC 3.1.2.4) expressed by primitive termites and 15 more apicals share significant homology, not only with each other, but also with enzyme derived from bacteria and plants. Unlike many members of the GH9 enzyme family, termite endoglycanases typically lack carbohydrate-binding domains, consisting only of the catalytic domains. NtEG, an endoglycanase from Nasutitermes takasagoensis, can be expressed in E. coli 20 as a functional enzyme. Differential cellulolytic activity, of the native form and the form marked at the C-terminus of a cellulase derived from Coptotermes formosanus and expressed in E. coli, allowed the in vitrode evolution of enzyme derivatives with improved properties, such as thermostability. Random exchanges of non-conserved amino acid residues between four parental termite cellulases for 4) 25 shuffling of families also improved thermostability. Any of these cellulases can be modified with an intein, as outlined here.
The NtEG endoglycanase was shown to be structurally stable under very acidic conditions. This may reflect the fact that, as mentioned earlier, termite-derived endoglycanases are exposed to a wide pH range in the intestine. The main endoglycanase of Nasutitermes takasagoensis (NtEG) had been crystallized and it undergoes only very subtle changes in structure through pH ranges from 6.5 to 2.5. Endoglycanases derived from termites modified with intein can be supplied under conditions involving exposure to strong pH changes.
Example 13 - Expression and characterization of termite endoglycanases.
An optimized codon version of NtEG (077044, SEQ ID NO: 2017) was prepared. The NtEG DNA sequence, as optimized for expression in plants, is shown below. Included in that sequence is a region (underlined in the sequence below) that encodes an N-terminal polypeptide of about 16 amino acids, which probably functions as a secretion signal when the protein is expressed in termite cells.
NtEG optimized codon ATGAGGGTGTTCCTTTGCCTGCTCTCGGCGCTAGCTTTGTGCCAGGCGGCTT ACGACTACAAGCAGGTGTTGCGGGACTCGCTACTATTCTATGAGGCCCAGAG ATCCGGCCGGCTCCCAGCCGACCAGAAGGTCACGTGGAGGAAGGATAGCGC GCTGAATGACCAGGGTGACCAGGGACAAGACTTGACCGGCGGCTACTTTGAC GCTGGGGACTTCGTCAAGTTCGGGTTCCCCATGGCTTATACCGCAACCGTGC TGGCATGGGGCCTCATAGATTTTGAGGCCGGCTACAGCAGTGCCGGGGCCTT GGATGATGGACGGAAGGCTGTCAAATGGGCCACCGACTATTTCATAAAGGCC CACACAAGTCAAAATGAGTTCTATGGTCAGGTCGGCCAGGGTGACGCCGATC ACGCTTTCTGGGGAAGACCAGAGGATATGACGATGGCGCGCCCGGCGTACA AGATAGACACCTCAAGGCCTGGCTCTGATCTGGCAGGCGAGACAGCGGCTGC TCTTGCCGCTGCTTCAATCGTGTTCCGGAACGTCGATGGCACTTACTCAAATA ACCTGTTAACACACGCTCGCCAGCTATTCGACTTCGCGAACAACTACCGGGG AAAGTATAGTGACTCTATTACTGACGCAAGAAATTTCTACGCAAGCGCAGAC TACAGAGACGAGTTGGTTTGGGCTGCTGCGTGGTTATACAGAGCGACCAACG ACAACACCTACCTCAACACTGCTGAGTCACTGTACGATGAGTTTGGGCTACA GAACTGGGGGGGGGGCCTGAACTGGGATAGCAAGGTGTCTGGCGTGCAGGT GTTGTTGGCCAAGCTTACCAATAAGCAGGCCTACAAGGACACGGTGCAGTCT TACGTCAATTACCTAATTAATAACCAGCAGAAGACTCCCAAGGGCCTCCTCTA CATCGACATGTGGGGCACCCTTC GCCACGCTGCCAACGCCGCATTCATCATG CTCGAAGCCGCCGAGCTGGGCTTGTCCGCCTCCTCTTATAGACAGTTCGCGC AAACGCAAATCGACTACGCCCTGGGCGATGGTGGCCGCTCCTTTGTGTGCGG GTTCGGGAGTAATCCTCCTACGAGACCGCACCACAGATCCTCGTCGTGCCCG CCAGCTCCCGCTACTTGCGACTGGAATACATTCAACTCACCTGACCCAAACT ACCACGTCCTCTCTGGGGCCCTAGTGGGCGGACCTGATCAGAATGACAACTA CGTCGATGACCGTTCAGACTATGTTCACAACGAAGTCGCCACTGATTACAAC GCGGGTTTCCAGTCCGCGTTAGCTGCTTTGGTGGCCCTTGGTTAC (SEQ ID NO: 2017)
The DNA fragment carrying this sequence was linked to the expression vector of Saccharomyces cerevisiae pAL410. The resulting construct, pAL410 NtET, is illustrated in FIG. 28. In FIG. 28, P-GAP is the nominally constitutive yeast 10 GAP promoter; alpha is the signal of yeast secretion from yeast alpha pairing, which is translated as an N-terminal fusion to termite-derived endoglycanase; NtEG-SP is the putative 16 amino acid signal sequence, which can propel NtEG secretion from termite cells; BAA33708 NtEG is the remainder of the coding sequence for termite endoglycanase; CYCt is a transcription terminator and polyadenylation signal derived from the yeast gene CYC1; f1 ori is the sequence for the generation of single-stranded plasmid derivatives; KanMX is a gene that confers resistance to G418 in yeast; 2u ori is the origin of 2 microns, allowing plasmid replication in yeast cells; bla is a gene that confers resistance to ampicillin in bacterial cells; and ColEI is a region that allows replication of the plasmid in E. coli.
It is possible that the two signal peptides, one derived from yeast and the second native in relation to NtEG, may conflict during expression from pAL410. To determine whether NtEG 5 expression could be increased by removing the native signal peptide, a derivative of the NtEG expression vector was prepared, which differed from the original vector only in the fact that it lacked 48 base pairs from the start of the NtEG open reading frame. These 48 base pairs encode native signal peptide. This vector (pAL410 NtEGm) was introduced into yeast cells.
Yeast cells, carrying one of pAL410, pAL410 NtEG or pAL410 NtEGm, were streaked onto YPD agar plates containing 100 mg / L of G418, on which a 1.5% agarose layer and AZCL-HE-cellulose were added. 0.2% (Megazyme International Ireland Ltd.) had been applied. As shown in FIG. 29, endoglycanase activity could be detected more readily in the vicinity of 15 colonies carrying pAL410 NtEGm, indicating both that the enzyme was active and that it was being secreted from growing cells.
Yeast cells carrying plasmids pAL410 NtEG, pAL410 NtEGm or pAL410-P54583 (Ace 1 endoglycanase, see Example 7), as well as a strain that carried the empty pAL410 vector as a control, were then grown in rich media, and the supernatants from culture were tested for endoglycanase activity using the Cellazyme C (Megazyme International Ireland Ltd.) assay, which measures the release of dye (absorbance at 590 nm) from AZCL-HE-cellulose. As shown in FIG. 30, the mature form of termite endoglycanase (NtEGm) clearly gives higher activity than does the full-length form, 25 which retains the native signal sequence. NtEGm also exhibits higher activity than P54583. Although both NtEGm and P54583 increase in activity as the temperature increases, NtEGm lost activity when incubated at 70 ° C, while P54583 activity continued to increase. These assays revealed that NtEGm expression produced more detectable endoglycanase activity than did P54583.
As a preliminary measure of the pH tolerance of the expressed enzymes, supernatants were collected from cultures that express both NtEGm and P54583. Due to its lower overall activity, the supernatant from the P54583 culture was concentrated 20 times via filtration through Milliconde filters cut to 10,000 molecular weight (Millipore, Bedford MA) prior to the assay. The tests with Cellazyme C were then carried out in buffers of different pH and at different temperatures. As shown in FIG. 31, NtEGm exhibited higher activity at pH 4.5 and 8.0 (as measured by the absorbance at 590 nm of the released dye) than did P54583. This trend occurred when cultures were incubated at 40 ° C or 58 ° C. As shown earlier, however, P54583 activity exceeded that of NtEGm by 70 ° C in both higher pH conditions.
The effects of pH on enzyme stability versus 5 effects of pH on enzyme activity (catalysis) were analyzed as follows. P54583 and NtEGm were prepared from culture supernatants as described above. The cultures were then exposed to different pH buffers for 1 hour. After this treatment, the buffers were exchanged for test buffer (pH 4.5) by means of filtration through Ultracel YM-30 regenerated cellulose filters (Millipore). The 10 results from these tests suggest that NtEGm resists pretreatments at pH values as high as 10.5, but is less resistant to pretreatments at pH 2 or pH 3 (data not shown).
To determine whether a His marker could be added to NtEGm and whether it had any impact on activity, a version of pAL410 15 NtEGm was created, in which 6 histidine codons were introduced immediately before the stop codon of the NtEGm coding sequence. This plasmid, pAL410 NtEGmHis, was introduced into yeast cells. The supernatants were then collected from yeast cell cultures carrying pAL410, pAL410NtEGm or pAL410 NtEGmHis, and tested for endoglycanase activity as 20 before. From these experiments (FIG. 32), it appears that the introduction of a His marker compromises endoglycanase activity. Example 14 - Modification of termite endoglycanase intein
A series of protein fusions were made with the Tth 25 intein inserted in NtEG in different positions. The integin insertion site was determined by the method described herein and was typically adjacent to serines, threonines or cysteines. Sequences encoding the recombinant NtEG proteins were then assembled using a SOE PCR strategy, as depicted in FIG. 33 (see also Example 6b). As shown in FIG. 33, primers were designed to annex 30 to: (A) the sequence encoding the alpha signal peptide in pAL410 NtEGm; (B) a region within the coding sequence for NtEGm, which is adjacent to the insertion site (in this case, serine 84); (C) the 5 'end of the coding sequence for the Tth integin; (D) the 3 'end of the coding sequence for the Tth integin; (E) a region within the coding sequence for NtEGm, which is adjacent to the insertion site (in this example, this site does not overlap with the one covered by the C primer); and (F) a region within the CYC terminator sequence from 5 pAL410 NtEGm.
RCP1 employs primers A and B to assemble a short product, which includes the coding sequences for a portion of the signal factor alpha, as well as the N-terminal portion of the endoglycanase (NtEG-N). The extreme 3 'end of the PCR product 1 includes a short segment, which is homologous to the extreme 5' 10 end of the Tth integin. RCP2 employs primers C and D to amplify the coding sequence of the Tth integin. RCP3 employs primers E and F to amplify the A coding sequences for the C-terminal portion of the endoglycanase (NtEG-C), including the amino acid “C + 1” (in this case, serine 84) and a short segment, which is homologous to the extreme 5 'end of the Tth integin, as well as a portion of the 15 CYC1 (CYC1t) terminator from pAL410. CPR products 1, 2, and 3 were then combined into a single CPR reaction; and, by virtue of their homology to the ends of the Tth intein, the PCR products 1 and 3 ringed with the PCR product 2. DNA synthesis and amplification with the outermost primers (A and F) led to assembly of the full-length product, as indicated in the bottom 20 of the diagram.
To prepare any desired intein-modified NtEG derivative, CPR products must be prepared in order to be tailor-made for each intein insertion position. However, some of the components of this experimental arrangement are modular. For example, primers C and D can be used to prepare CPR 2 product, which can then be used to assemble any of the planned recombinants. Similarly, primers A and F can be used to prepare CPR products 1 and 3, respectively, regardless of the position of insertion. As such, only primers B and E are unique for a given intein insertion event. Table 11 below lists the 30 sequences (in 5'-3 'orientation) of the oligonucleotide primers, which were used to assemble each of the integin-modified NtEG endoglycanases. While primers B and E are unique for each product, each contains a region that is homologous to the Tth intein terminal. This constant region is underlined in each primer sequence in Table 11. Table 11



The insertion sites listed in Table 11 refer to the identity and relative position of the amino acid residue at the C + 1 position of the extein. The numbering is relative to the amino acid sequence of the predicted NtEGm polypeptide, with 2-5 corresponding to amino acids 17-20 (Ala-Tyr-Asp-Tyr), of the native NtEG 5 sequence (077044) (SEQ ID NO: 112) .
Using the above primers, RCP SOE reactions were performed. A subset of these recombinant PCR products was linked to pCRBIunt II TOPO (Invitrogen, Carlsbad CA), sequenced to confirm the composition, and then transferred to the yeast expression vector pAL410. The 10 supernatants were collected from cultures of yeast cells bearing pAL410, pAL410 NtEGm or pAL410 NtEGm with the Tth integin inserted adjacent to serine 84, threonine 303, serine 325 or threonine 333. These supernatants were then examined in the C Cellazyme assay, and endoglycanase activity was monitored as an increase in absorbance at 590 nm (due to the release of the dye from the AZCL-HE-cellulose substrate) as a function of time. FIG. 34 shows that the insertion of the Tth intein in any of the four tested positions strongly reduces the activity of the enzyme. Example 15 - P77853, xylanases modified with intein Inteína Selection
Amino acid sequences of inteins were selected from the Inbase database (version 7/2007). This database contained several trans seam integers, which were merged and this list was reduced to 408 integers. The following internines have been removed: Mth R1R1 internine, Tth-HB8 DnaE-1 internine, Tth-HB27 DnaE-1 internine, Pol-3 tag (Tsp-TY Pol-3), Tac-25 ATCC25905 VMA, Psp internine -GBD Pol. The following integins, from pathogenic species, have also been removed: MTU (5 integers), Mch (1), Mma (1), Mbo (5), Mfa (1), Mfl (2), Mga (3), Mgo (1), Min (1), Mkas (1), Mie (4), Msh (1), Msm (2), Msp (4), Mthe (1), Mtu (5), Mvan (2), Mxe (1). From the remaining 361 sequences, the sequences that had sequence identity of> 62% with respect to the other remaining sequence 30 were removed. For example, for a set of sequences A, B, C and D, in which all of them had an identity> 62%, three of them would be eliminated. For a pair of similar strings, the sequence to be removed was selected, for which one was less thermophilic, the order of thermophilicity from highest to lowest is hyperthermophilic> thermophilic> mesophilic = UNK. The strings were classified by the optimal growth conditions of their host organisms, using the Prokaryotic Growth Temperature Database (PGTdb) and other bibliographic sources. Hyperthermophils were defined as organisms with optimal growth temperature greater than 80 ° C, thermophiles were 45-80 ° C, and mesophiles were less than 45 ° C. The UNK classification was for an organism that could not be classified. After this process, 157 sequences were left for the test, of which 70 were from hyperthermophilic organisms, 19 from thermophilic organisms, 64 from mesophilic organisms and 4 from organisms in the unknown group. P77853 construction modified with intein
The DNA sequence of all selected integers were codon optimized for Zea mays (corn) by GenScript. The internines were then analyzed against the following restriction sites, to make sure that they were not present: GAATTC, EcoRI; CTCGAG, Xhol and CATATG, Ndel. Several of the sequences had Ndel sites, which were mutated, so that they translated into the same pair of amino acids, which resulted from the codons formed by the original Ndel site. SEQ ID NOS: 2059 - 2215 list the intein sequences used after codon optimization and removal of EcoRI, Xhol or Ndel sites, if present. One sequence had an Xhol site before being mutated to CTGGAG. The amino acid sequence of that encoded by each of SEQ ID NOS: 2059 - 2215 is given in SEQ ID NOS: 2216 - 2372, respectively. All inteins were then inserted into an optimized codon plasmid, which contained the enzyme P77853 (SEQ ID NO: 104). The insertion site was before codon T134 or codon S158, and the sequences below show the nucleic acid encoding codon optimized P77853, the plasmid nucleic acid sequence and the insertion point of intein. In addition, SEQ ID NOS: 2687 - 3000 list each of the sequences encoding P77853 nucleic acid modified with intein used in this example. The experiments described below list samples AS-1 through AS-157 and AT-1 through AT-157. SEQ ID NOS: 2373 - 2529 correspond to the protein amino acid sequence of samples AS-1 through AS-157, respectively. SEQ ID NOS: 2530 - 2686 correspond to the protein amino acid sequence of samples AT-1 through AT-157, respectively. SEQ ID NOS: 2687 - 2843 correspond to the nucleic acid sequences that encode proteins in samples AS-1 through AS-157, respectively. SEQ ID NOS: 2844 - 3000 correspond to the nucleic acid sequences that encode the proteins in samples AT-1 through AT-157, respectively. SEQ ID NOS: 3001 - 3157 correspond to nucleic acid sequences that code for the protein of samples AS-1 to AS-157, respectively, in pBluescript. SEQ ID NOS: 3158 - 3314 correspond to the nucleic acid sequences that encode the proteins in samples AT-1 through AT-157, respectively, in pBluescript. The strings below, in the following two paragraphs, contain P77843 (lowercase letters) inserted in the plasmid pBluescript (uppercase) and the site for insertion of intein (inside double angled brackets). See SEQ ID NOS: 2059 - 2215 for the nucleic acid sequences encoding the inserted intein, and SEQ ID NOS: 2216 - 2372 for the respective encoded intein amino acid sequence. Plasmid for insertion S158-P77853 GCACTTTTCGGGGAAATGTGCGCGGAACCCCTATTTGTTTATTTTTCTAAATA CATTCAAATATGTATCCGCTCATGAGACAATAACCCTGATAAATGCTTCAATA ATATTGAAAAAGGAAGAGTATGAGTATTCAACATTTCCGTGTCGCCCTTATTC CCTTTTTTGCGGCATTTTGCCTTCCTGTTTTTGCTCACCCAGAAACGCTGGTG AAAGTAAAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAAC TGGATCTCAACAGCGGTAAGATCCTTGAGAGTTTTCGCCCCGAAGAACGTTT TCCAATGATGAGCACTTTTAAAGTTCTGCTATGTGGCGCGGTATTATCCCGT ATTGACGCCGGGCAAGAGCAACTCGGTCGCCGCATACACTATTCTCAGAATG ACTTGGTTGAGTACTCACCAGTCACAGAAAAGCATCTTACGGATGGCATGAC AGTAAGAGAATTATGCAGTGCTGCCATAACCATGAGTGATAACACTGCGGCC AACTTACTTCTGACAACGATCGGAGGACCGAAGGAGCTAACCGCTTTTTTGC ACAACATGGGGGATCATGTAACTCGCCTTGATCGTTGGGAACCGGAGCTGAA TGAAGCCATACCAAACGACGAGCGTGACACCACGATGCCTGTAGCAATGGCA ACAACGTTGCGCAAACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGGC AACAATTAATAGACTGGATGGAGGCGGATAAAGTTGCAGGACCACTTCTGCG CTCGGCCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGAGCCGGTGAG CGTGGGTCTCGCGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCC CGTATCGTAGTTATCTACACGACGGGGAGTCAGGCAACTATGGATGAACGAA ATAG ACAGATCGCTGAGATAGGTGCCTCACTGATTAAGCATTGGTAACTGTC AGACCAAGTTTACTCATATATACTTTAGATTGATTTAAAACTTCATTTTTAATT TAAAAGGATCTAGGTGAAGATCCTTTTTGATAATCTCATGACCAAAATCCCTT AACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTAGAAAAGATCAAAGG ATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGCTTGCAAACAAAAA AACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCAAGAGCTACCAACTCT TTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACTGTCCTT CTAGTGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTA CATACCTCGCTCTGCTAATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAA GTCGTGTCTTACCGGGTTGGACTCAAGACGATAGTTACCGGATAAGGCGCAG CGGTCGGGCTGAACGGGGGGTTCGTGCACACAGCCCAGCTTGGAGCGAACG ACCTACACCGAACTGAGATACCTACAGCGTGAGCTATGAGAAAGCGCCACGC TTCCCGAAGGGAGAAAGGCGGACAGGTATCCGGTAAGCGGCAGGGTCGGAA CAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTATA GTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGCTC GTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACG GTTCCTGGCCTTTTGCTGGCCTTTTGCTCACATGTTCTTTCCTGCGTTATCCC CTGATTCTGTGGATAACCGTATTACCGCCTTTGAGTGAGCTGATACCGCTCG CCGCAGCCGAACGACCGAGCGCAGCGAGTCAGTGAGCGAGGAAGCGGAA GA GCGCCCAATACGCAAACCGCCTCTCCCCGCGCGTTGGCCGATTCATTAATGC AGCTGGCACGACAGGTTTCCCGACTGGAAAGCGGGCAGTGAGCGCAACGCA ATTAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCTTTACACTTTATGCT TCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGA AACAGÇATatgcaaacaagcattactctgacatccaacgcatccggtacgtttgacggttactattacgaactct ggaaggatactggcaatacaacaatgacggtctacactcaaggtcgcttttcctgccagtggtcgaacatcaataac gcgttgtttaggaccgggaagaaatacaaccagaattggcagtctcttggcacaatccggatcacgtactctgcgact tacaacccaaacgggaactcctacttgtgtatctatggctggtctaccaacccattggtcgagttctacatcgttgagt cctgggggaactggagaccgcctggtgccacgtccctgggccaagtgacaatcgatggcgggacctacgacatctat aggacgacacgcgtcaaccagcct 'insert.intein.here.for.S158' tccattgtggggacagccacgtt cgatcagtactggagcgtgcgcacctctaagcggacttcaggaacagtgaccgtgaccgatcacttccgcgcctgggc gaaccggggcctgaacctcggcacaatagaccaaattacattgtgcgtggagggttaccaaagctctggatcagcca acatcacccagaacaccttctctcagggctcttcttccggcagttcgggtggctcatccggctccacaacgactactcgc atcgagtgtgagaacatgtccttgtccggaccctacgttagcaggatcaccaatccctttaatggtattgcgctgtacg ccaacggagacacagcccgcgctaccgttaacttccccgcaagtcgcaactacaatttccgcctgcggggttgcggca acaacaataatcttgcccgtgtggacctgaggatcgacggacggaccgtcgggaccttttattaccagggcacatacc cctgggaggccccaattgacaatgtttatgtcagtgcggggagtcatacagtcgaaatcactgttactgcggataac ggcacatgggacgtgtatgccgactacctggtgatacagtgaCTCGAGGGGGGGCCCGGTACCCA ATTCGCCCTATAGTGAGTCGTATTACAATTCACTGGCCGTCGTTTTACAACGT CGTGACTGGGAAAACCCTGGCGTTACCCAACTTAATCGCCTTGCAGCACATC CCCCTTTCGCCAGCTGGCGTAATAGCGAAGAGGCCCGCACCGATCGCCCTTC CCAACAGTTGCGCAGCCTGAATGGCGAATGGAAATTGTAAGCGTTAATATTT TGTTAAAATTCGCGTTAAATTTTTGTTAAATCAGCTCATTTTTTAACCAATAG GCCGAAATCGGCAAAATCCCTTATAAATCAAAAGAATAGACCGAGATAGGGT TGAGTGTTGTTCCAGTTTGGAACAAGAGTCCACTATTAAAGAACGTGGACTC CAACGTCAAAGGGCGAAAAACCGTCTATCAGGGCGATGGCCCACTACGTGA ACCATCACCCTAATCAAGTTTTTTGGGGTCGAGGTGCCGTAAAGCACTAAAT CGGAACCCTAAAGGGAGCCCCCGATTTAGAGCTTGACGGGGAAAGCCGGCG AACGTGGCGAGAAAGGAAGGGAAGAAAGCGAAAGGAGCGGGCGCTAGGGC GCTGGCAAGTGTAGCGGTCACGCTGCGCGTAACCACCACACCCGCCGCGCT TAATGCGCCGCTACAGGGCGCGTCAGGTG (SEQ ID NOS: 3001-3157) plasmide to the insertion-P77853 T134 GCACTTTTCGGGGAAATGTGCGCGGAACCCCTATTTGTTTATTTTTCTAAATA CATTCAAATATGTATCCGCTCATGAGACAATAACCCTGATAAATGCTTCAATA ATATTGAAAAAGGAAGAGTATGAGTATTCAACATTTCCGTGTCGCCCTTATTC CCTTTTTTGCGGCATTTTGCCTTCCTGTTTTTGCTCACCCAGAAACGCTGGTG AAAGTAAAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAAC TGGATCTCAACAGCGGTAAGATCCTTGAGAGTTTTCGCCCCGAAGAACGTTT TCCAATGATGAGCACTTTTAAAGTTCTGCTATGTGGCGCGGTATTATCCCGT ATTGACGCCGGGCAAGAGCAACTCGGTCGCCGCATACACTATTCTCAGAATG ACTTGGTTGAGTACTCACCAGTCACAGAAAAGCATCTTACGGATGGCATGAC AGTAAGAGAATTATGCAGTGCTGCCATAACCATGAGTGATAACACTGCGGCC AACTTACTTCTGACAACGATCGGAGGACCGAAGGAGCTAACCGCTTTTTTGC ACAACATGGGGGATCATGTAACTCGCCTTGATCGTTGGGAACCGGAGCTGAA TGAAGCCATACCAAACGACGAGCGTGACACCACGATGCCTGTAGCAATGGCA ACAACGTTGCGCAAACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGGC AACAATTAATAGACTGGATGGAGGCGGATAAAGTTGCAGGACCACTTCTGCG CTCGGCCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGAGCCGGTGAG CGTGGGTCTCGCGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCC CGTATCGTAGTTATCTACACGACGGGGAGTCAGGCAACTATGGATGAACGAA ATAGACAGATCG CTGAGATAGGTGCCTCACTGATTAAGCATTGGTAACTGTC AGACCAAGTTTACTCATATATACTTTAGATTGATTTAAAACTTCATTTTTAATT TAAAAGGATCTAGGTGAAGATCCTTTTTGATAATCTCATGACCAAAATCCCTT AACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTAGAAAAGATCAAAGG ATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGCTTGCAAACAAAAA AACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCAAGAGCTACCAACTCT TTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACTGTCCTT CTAGTGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTA CATACCTCGCTCTGCTAATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAA GTCGTGTCTTACCGGGTTGGACTCAAGACGATAGTTACCGGATAAGGCGCAG CGGTCGGGCTGAACGGGGGGTTCGTGCACACAGCCCAGCTTGGAGCGAACG ACCTACACCGAACTGAGATACCTACAGCGTGAGCTATGAGAAAGCGCCACGC TTCCCGAAGGGAGAAAGGCGGACAGGTATCCGGTAAGCGGCAGGGTCGGAA CAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTATA GTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGCTC GTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACG GTTCCTGGCCTTTTGCTGGCCTTTTGCTCACATGTTCTTTCCTGCGTTATCCC CTGATTCTGTGGATAACCGTATTACCGCCTTTGAGTGAGCTGATACCGCTCG CCGCAGCCGAACGACCGAGCGCAGCGAGTCAGTGAGCGAGGAAGCGGAAGA GCGCC CAATACGCAAACCGCCTCTCCCCGCGCGTTGGCCGATTCATTAATGC AGCTGGCACGACAGGTTTCCCGACTGGAAAGCGGGCAGTGAGCGCAACGCA ATTAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCTTTACACTTTATGCT TCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGA AACAGÇATatgcaaacaagcattactctgacatccaacgcatccggtacgtttgacggttactattacgaactct ggaaggatactggcaatacaacaatgacggtctacactcaaggtcgcttttcctgccagtggtcgaacatcaataac gcgttgtttaggaccgggaagaaatacaaccagaattggcagtctcttggcacaatccggatcacgtactctgcgact tacaacccaaacgggaactcctacttgtgtatctatggctggtctaccaacccattggtcgagttctacatcgttgagt cctgggggaactggagaccgcctggtgcc "insert.intein.here.for.T134» acgtccctgggccaagtg acaatcgatggcgggacctacgacatctataggacgacacgcgtcaaccagccttccattgtggggacagccacgtt cgatcagtactggagcgtgcgcacctctaagcggacttcaggaacagtgaccgtgaccgatcacttccgcgcctgggc gaaccggggcctgaacctcggcacaatagaccaaattacattgtgcgtggagggttaccaaagctctggatcagcca acatcacccagaacaccttctctcagggctcttcttccggcagttcgggtggctcatccggctccacaacgactactcgc atcgagtgtgagaacatgtccttgtccggaccctacgttagcaggatcaccaatccctttaatggtattgcgctgtacg ccaacgg agacacagcccgcgctaccgttaacttccccgcaagtcgcaactacaatttccgcctgcggggttgcggca acaacaataatcttgcccgtgtggacctgaggatcgacggacggaccgtcgggaccttttattaccagggcacatacc cctgggaggccccaattgacaatgtttatgtcagtgcggggagtcatacagtcgaaatcactgttactgcggataac ggcacatgggacgtgtatgccgactacctggtgatacagtgaCTCGAGGGGGGGCCCGGTACCCA ATTCGCCCTATAGTGAGTCGTATTACAATTCACTGGCCGTCGTTTTACAACGT CGTGACTGGGAAAACCCTGGCGTTACCCAACTTAATCGCCTTGCAGCACATC CCCCTTTCGCCAGCTGGCGTAATAGCGAAGAGGCCCGCACCGATCGCCCTTC CCAACAGTTGCGCAGCCTGAATGGCGAATGGAAATTGTAAGCGTTAATATTT TGTTAAAATTCGCGTTAAATTTTTGTTAAATCAGCTCATTTTTTAACCAATAG GCCGAAATCGGCAAAATCCCTTATAAATCAAAAGAATAGACCGAGATAGGGT TGAGTGTTGTTCCAGTTTGGAACAAGAGTCCACTATTAAAGAACGTGGACTC CAACGTCAAAGGGCGAAAAACCGTCTATCAGGGCGATGGCCCACTACGTGA ACCATCACCCTAATCAAGTTTTTTGGGGTCGAGGTGCCGTAAAGCACTAAAT CGGAACCCTAAAGGGAGCCCCCGATTTAGAGCTTGACGGGGAAAGCCGGCG AACGTGGCGAGAAAGGAAGGGAAGAAAGCGAAAGGAGCGGGCGCTAGGGC GCTGGCAAGTGTAGCGGTCACGCTGCGCGTAACCACCACACCCGCCGCGCT TAATGCGCCGCTACAGGGCGCGTCAGGTG (SEQ ID NOS: 3158-3314) Switching tests
PBluescript plasmids, encoding intein modified P77853, were transformed into bacterial host of E. coli TOP10 5 (Invitrogen) and plated on LB agar supplemented with ampicillin (100 mg / L). After overnight incubation at 37 ° C, eight colonies (biological replicates) were collected from each construct and placed in 1 mL of autoinduction medium (AIM, Novagen), supplemented with carbenicillin (100 mg / L) in plates 96 wells. Cultures were grown at 900 rpm on a Multitron shaker (Inters HT), at 37 ° C for 10 hours, then 30 ° C for 6 ~ 8 hours. Cells were harvested and lysed in 100 pL of poly-buffer (at various pHs: 4.5, 5.5, 6.5 or 7.5) containing 10% of 10 x FastBreak x (Promega) and benzonase (0.1 pL / ml 25KUN, Novagen) at 30 ° C for 1 hour. The lysate was diluted with poly-buffer (same pH as the lysis buffer) to the final volume of 1 mL and divided for heat treatment. Heat treatment temperatures were 37 ° C, 50 ° C, 55 ° C or 60 ° C for 2 hours, 4 hours or 6 hours. The samples were placed on ice. The xylanase activity was tested with the solid substrate of AZCL-birch xylan (Megazyme): lactose = 25%: 75% distributed in 384 well plates with the VP724B sodium distributor (V&P Scientific) in a reaction mixture of 30 pL of lysate and 40 pL of poly-buffer (same pH as the lysis buffer), at 37 ° C for 30 min, 45 min, 65 min or 100 min. The absorption was read at 590 nm in a Paradigma plate reader (Beckman Coulter).
The results of the enzyme assay at pH 6.5 and pH 7.5 are shown below in FIG. 36A - From 37A - D, both for the T134 insertion site and for the P77853 S158 insertion site. FIGs. 36A - D show the results at pH 6.5, and Figs 37A - D show the results at pH 7.5. The activities, at high and low temperatures are plotted against the wild type P77853 (FIGS. 36A and C). High temperature activities versus fold induction (high temperature activity / low temperature activity) are also plotted (FIGS. 36B and D). Internines are discriminated by the thermophilicity of their hosts. The vertical dashed line represents 10% of the wild type activity at low temperature. The horizontal dashed line is 40% of the wild type activity at high temperature. Both for the insertion site in T134 and for the insertion site in S158 the insertion sites, there were numerous integers that caused a great induction of folds. However, only the inserts in S158 exhibited candidates that were close to or met the metrics (low temperature activity less than or equal to 10% of the native enzyme (unmodified with intein) and high temperature enzyme activity greater than 40 % of native enzyme (not modified with intein) enzyme).
The best candidates for intein-modified xylanases (as punctuated by high global temperature activity), for each pH and insertion site, are shown in FIGs. 38A - D. The set of internines was inserted in position S158 (FIGS. 38A and C) and in position T134 (FIGS. 39B and D) of P77853. The activities following the thermal pretreatments at high (right bar for each sample) and low (left bar for each sample) temperatures, at pH 6.5 (FIGS. 38 A and B) and at pH 7.5 ( Figures 38C and D) the 20 best candidates with higher activities compared to the wild type and the empty vector are plotted. The dashed line between 2 and 4 on the Activity axis represents 40% of wild type activity at high temperature. The dashed line below 2 represents 10% of the wild type activity at low temperature. As shown in FIGs. 38A - D, both for the insertion site in S158 and for the insertion site in T134, the candidates with the best score present activities from heat treatments at high temperature, which are close to or above 40% of the type activities wild. However, there is a greater fold induction for more of the best candidates seen on the S158 site than on the T134 site. Additionally, there are a few candidates that meet or are very close to the set of metrics (activity <10% of WT before amendment and activity> 40% of WT after exposure to amendment conditions) for site S158 including AS-146 and AS-79 at pH 6.5, and AS-79, AS-2 and AS-83 at pH 7.5. All the best candidates for T134 have much higher activity following low temperature heat treatment.
All candidates for each insertion site and pH were grouped by performance in the switching assay. These groups are composed based on their activity, following exposure to high temperature and exposure to low temperature. For the activity classification, the groups are non-permissive, (NP = <10% wt activity), weak activity (W = 10-30% wt activity) and strong activity 5 (> 30% wt activity) ). For switching, the groups are permissive (P = <2X ratio of activity at high temperature to activity at low temperature), commutation (S = ratio of 2-3 X of activity at high temperature compared to activity at low temperature ) and strong switching (SS => 3 X ratio of activity at high temperature to activity at low temperature). The distribution and performance of these for each pH is shown in Table 12, below, and activity data for representative interim in each group is shown in FIGs. 39A - D. FIGs. 39A and C illustrate the data for P77853 intein inserts in S158. FIGs. 39B and D illustrate DNA for P77853 intein inserts in T134. FIGs. 39A and B correspond to heat treatments at pH 6.5. FIGs. 39C and D 15 correspond to heat treatments at pH 7.5. The dashed line between 2 and 4 on the Activity axis represents 40% of wild type activity at high temperature. The dashed line below 2 represents 10% of low-temperature wild-type activity. As shown in FIGs. 39A - D, for both cases, there are a small number of switches with weak and strong activity. However, there are many more permissive integers for the T134 site than for the S158 site. This is similar to what was previously seen for Tth insertions at these sites, in that the integins inserted in T134 are generally not able to block the activity as strongly, leaving a higher baseline activity. Table 12

Referring to FIG. 40, the best candidates from the previous selection (AS-146, AS-2, AS-79, AS-83) were revalidated in relation to the heat-inducible enzyme activity and compared to a non-permissive candidate (AS-8), to the P77853 wild type enzyme and to the empty vector pBS. The dashed line above 1, on the Activity 5 axis, represents 40% of wild type activity at high temperature. The dashed line below 0.5 represents 10% of the wild type activity at low temperature. In this retest, AS-79 and AS-83 meet the metrics of> 40% activity at high temperature (58% and 40%, respectively) and <10% activity at low temperature (6% and 10%). AS-146 was close to the metric with an activity at high temperature slightly lower than (34%), but still below the metric of activity at low temperature (7%). More heat-induced activity The higher and lower non-induced activity may be a desirable property for these enzymes. Western blot
Referring to FIG. 41, the candidates with the best performance, at the insertion site in S158 (AS-2, AS-79, AS-83 and AS-146) and at the insertion site in T134 (AT-2, AT-83, AT-149 , AT-154) of P77853 were analyzed for heat-inducible splicing using a western blot. The culture and lysis conditions were the same as for the switching assay, but polyampamp was used 20 at a pH: 6.5 and the heat treatments were at 37 ° C and 60 ° C for 4 hours. The western blot was developed with a primary rabbit anti-P77853 antibody, using standard procedures. In FIG. 41, pBS is the empty vector control, and P77 is the positive control (P77853). The bar on the left and right above each 0 25 sample label represents the low temperature (37 ° C / 4 hours) and heated (60 ° C / 4 hours) aliquots from the same lysate, respectively. The arrows indicate the precursors of P77853 modified with intein, and NC marks the position of the mature protein. AS-83, AS-146 and AS-79 and, to a lesser extent, AT-154, EM-149 and AT83, show a modest heat-inducible accumulation of mature protein (NC). This is consistent with the heat-inducible seam. However, the extent of the heat-inducible seam alone does not appear to quantitatively reflect the heat-inducible enzyme activity in AS-79, AS-83 and AS-146 (compare FIG. 41 with FIG. 40). Common characteristics of the best candidates Heat tolerance of the host organism
There is a higher proportion of highly active candidates and a higher proportion of switching candidates seen with internines from hyperthermophilic and thermophilic organisms than from those mesophiles / UNK. This can be seen in the following tables, which break down the data in Table 1 into the distribution and performance in the intine switching assay, for inteins from hyperthermophilic and thermophilic organisms (Table 13) and for integines from mesophilic and UNK organisms (Table 14). These distributions are shown, in which the data is normalized as a fraction of total candidates in the thermo-tolerance group. The fraction of candidates showing high activity at high temperatures (FIG. 42A) and switching greater than 2 X (FIG. 42B) is compared for integers from hyperthermophilic / thermophilic organisms (bar on the right for each of the four sample labels ). Using chi-square analysis, activity and switching are seen to be significantly different (p <0.05) for these two groups for T134 inserts, while only switching is significant for S158 insertions. Using the Wilcoxon score sum test, in the raw activity data, instead of the grouped data, significant differences were shown (p value <0.05), both for the activity and for the switching in both sites, and for heat treatments at both pH 6.5 and pH 7.5. Despite the higher number of highly active candidates identified from the thermophilic and hyperthermophilic category, the highest number of constructs still appears in the non-permissive, weakly permissive and strongly permissive categories, for all tests with internines. Table 13
Table 14
Size of internines or presence of endonuclease domain
Referring to FIGs. 43A and B, differences in the length of the internines in relation to activity and switching were examined. FIGs. 43A and B illustrate the differences in activity and switching based on the length of the integin. The fraction of candidates showing high activity at high temperatures (FIG. 43A) and commutation higher than 2 X (FIG. 43B) is compared for integers <240 amino acids (bar on the left for each of the four sample labels) in those> 240 amino acids (bar on the right for each of the four sample labels). Longer inteins,> 240 amino acids, predominantly contain an identified endonuclease domain, while shorter integers do not. Tables 15 and 16 below show the performance distribution in the integin to integin switching assay> 240 amino acids (Table 15) and the performance distribution in the integin switching assay for intine lengths <240 amino acids (Table 16). It was examined whether there was a significant difference in candidates in these two groups. By decomposing the groups into longer and shorter internines, it can be seen that there is an increase in the high activity group, for both sites, for shorter length internines and a switching difference between the T134 and S158 sites. Using the Wilcoxon score sum test in the ungrouped raw activity data, only the increase in activity for the shorter integers, at the T134 site, and the increase in switching for the longer integers, at the T134 site, if showed significant (p <0.05). This fact could be related to the relative proximity of the T134 site to the active site of P77853. Since it is somewhat distant from the S158 site, a shorter integin may not be large enough to block the active site and, thus, would have higher activity before the splice. Conversely, larger integins may better block the active site, as they take up more space, which would result in higher switching. The lack of significance of the S158 data could be due to the fact that, since they are significantly closer to the active site, 30 small integers are sufficient to block the activity and, thus, the larger intein has no additional advantage. Table 15
Table 16
String similarities
Most inteins have several conserved domains (which are also referred to as "blocks"), which are referred to by the letters A, B, C, D, E, F, G and H. Of these blocks, C, D, E and H are often found in the endonuclease domain of most inteins. The top hit sequences and blocks A, B, F and G did not show significant agglomeration in a small group in relation to the remaining 10 of the sequences. This suggests that there was not a strong single-streak feature among all the best candidates, which was different from the rest of the set of strings. Referring to FIGs. 44A - D, however, the internines that produced Top Hits (classified as> 40% wt activity or> 30% wt activity and> 2x switching) for the S158 insertion site, were statistically more likely to have sequences similar (E value of a burst alignment <1e-20), which produce integers top hits, than integines that do not produce top hits. FIGs. 44A - D illustrate the similarity of sequence between the top hits. FIGs. 44A and C illustrate those for the S158 P77853 intein inserts, FIGs. 44B and D illustrate those for T134. FIGs. 44A and B illustrate those for heat treatments at pH 6.5. FIGs. 44C and D illustrate those for heat treatments at pH 7.5. “None Hits” represent the rest of the strings not in the Top Hits group. Figures 44A - D show the fraction of similar strings (value E <1e-20), which are also top hits (“Similar Top Hits” in the bar at the left for each panel) or none hits (“Similar Hits” in the bar on the right for each panel). These results were shown to be statistically significant using Chi-Square analysis. The resulting p-values for both pH 6.5 and pH 7.5 were less than 0.05. This suggests that sequences that are similar to the top hits are more likely than the set as a whole to produce good candidates (at least for the S158 insertion sites). As a result, it may be useful to include at least the strings that are close to the top hits in the modalities here. The top hits samples included a protein modified with intein having the sequence of one of SEQ ID NOS: 2374, 2423, 2424, 2431, 2451, 2455, 2461, 2466, 2467, 2471, 2479, 2483, 2493, 2507, 2510, 2511, 2518, 2531, 2540, 2541, 2543, 2545, 2548, 2569, 2571, 2574, 2575, 2581, 2582, 2584, 2585, 2586, 2587, 2588, 2590, 2591, 2594, 2602, 2608, 2610, 2612,2613, 2617, 2618, 2619, 2620, 2624, 2626, 2630, 2636, 2637, 2639, 2643, 2645, 2652, 2656, 2657, 2661,2664, 2666, 2667, 2668, 2678, 2680, 2682 and 2682 2683.
Raw data for activity assays, for enzymes modified with intein in this example, are provided in Table 17, below. The sequence of each protein modified with intein in samples AS-2 to AS-147 and AT-1 to AT-157, listed in Table 17, is provided in SEQ ID NOS: 2374 - 2519 and 2530 - 2686, respectively. Table 17






Example 16 - Protein modified with intein from Example 15.
Sample AS-146 from Example 15 was subjected to mutagenesis. The amino acid sequence of the AS-146 intein modified protein (SEQ ID NO: 2518) is shown below, followed by the amino acid sequences of eight mutant AS-146 (SEQ ID NOS: 3315 - 3322). The amino acid sequence of intein, in each of SEQ ID NOS: 2518 and 3315 - 3322, is shown below by 10 underlining. Mutations in the intein-modified protein of SEQ ID NOS 3314 - 3322, relative to the intein-modified protein of SEQ ID NO: 2518, are shown below in the larger bold font. Following the amino acid sequences, the nucleic acid, which encodes the AS-146 intein modified protein (SEQ ID NO: 2832), is presented and followed by the nucleic acid sequences of SEQ ID NOS: 15 3323 - 3330, which encode the integin-modified protein mutants of SEQ ID NOS: 3315 - 3322, respectively. The integin coding sequence in each of SEQ ID NOS: 2832 and 3323-3330 is shown below by underlining. Mutations in the coding sequence of the protein modified with intein in SEQ ID NOS: 3323 - 3330 are shown below in boldface. All results were unexpected 20 due to the fact that these integers were selected in relation to performance, not knowing whether they would work or not. Many have had no prior experimental verification of their function as integers. AS-146 (P77853_Tko_RadA_intein_S158)
MQTSITLTSNASGTFDGYYYELWKDTGNTTMTVYTQGRFSCQWSNINNALFRTGKKYNQNWQSLGTIRITYSATYNPNGNSYL CIYGWSTNPLVEFYIVESWGNWRPPGATSLGQVTIDGGTYDIYRTTRVNQPCFAKDTKVYYENDTLVHFESIEDMYHKYASLG REVPFDNGYAVPLETVSVYTFDPKTGEVKRTKASYIYREKVEKLAEIRLSNGYLLRITLLHPVLVFRNGLQWVPAGMIKPGDL IVGIRSVPANAATIEESEAYFLGLFVAEGTSNPLSITTGSEELKDFIVSFIEDHDGYTPTVEVRRGLYRILFRKKTAEWLGEL ATSNASTKWPERVLNAGESAIAAFLAGYLDGDGYLTESIVELVTKSRELADGLVFLLKRLGITPRISQKTIEGSVYYRIYIT GEDRKTFEKVLEKSRIKPGEMNEGGVGRYPPALGKFLGKLYSEFRLPKRDNETAYHILTRSRNVWFTEKTLSRIEEYFREALE KLSEARKALEMGDKPELPFPWTAITKYGFTDRQVANYRTRGLPKRPELKEKWSALLKEIERLEGVAKLALETIELARRLEFH EVSSVEWDYNDWVYDLVIPETHNFIAPNGLVLHNSIVGTATFDQYWSVRT SKRTSGTVTVTDHFRAWANRGLNLGTIDQITL CVEGYQSSGSANITQNTFSQGSSSGSSGGSSGSTTTTRIECENMSLSGPYVSRITNPFNGIALYANGDTARATVNFPASRNYN FRLRGCGNNNNLARVDLRIDGRTVGTFYYQGTYPWEAPIDNVYVSAGSHTVEITVTADNGTWDVYADYLVIQ (SEQ ID NO: 2518) AS-146-2 (int 4)
MQTSITLTSNASGTFDGYYYELWKDTGNTTMTVYTQGRFSCQWSNINNALFRTGKKYNQNWQSLGTIRITYSATYNPNGNSYL CIYGWSTNPLVEFYIVESWGNWRPPGATSLGQVTIDGGTYDIYRTTRVNQPCFAKDTKVYYENDTLVHFESIEDMYHKYASLG REVPFDNGYAVPLETVSVYTFDPKTGEVKRTKASYIYREKVEKLAEIRLSNGYLLRITLLHPVLVFRNGLQWVPAGMIKPGDL IVGIRSVPANAATIEESEAYFLGLFVAEGTSNPLSITTGSEELKDFIVSFIEDHDGYTPTVEVRRGLYRILFHKKTAEWLGEL ATSNASTKWPERVLNAGESAIAAFLAGYLDGDGYLTESIVELVTKSRELADGL.VFLLKRLGITPWISQKTIEGSVYYRVYIM GEDRKTFEKVLEKSRIKPGEMNEGGVGRYPPALGKFLGKLYSEFRLPKRDNETAYHILTRSRNVWFTEKTLSRIEEYFREALE KLSEARKALEMGDKPELPFPWTAITKYGFTDRQVANYRTRGLPKRPELKEKWSALLKEIERLEGVAKLALETIELARRLEFH EVS S DQ I VEWDYNDWVYDLVIPETHNFIAPNGLVLHNSIVGTATFDQ YWSVRTSKRTSGTVTVTDHFRAWANRGLNLGTI TL CVEGYQSSGSANITQNTFSQGSSSGSSGGSSGSTTTTRIECENMSLSGPYVSRITNPFNGIALYANGDTARATVNFPASRNYN FRLRGCGNNNNLARVDLRIDGRTVGTFYYQGTYPWEAPIDNVYVSAGSHTVEITVTADNGTWDVYADYLVIQ (SEQ ID NO: 3315), 146-4 AS (int 5)
MQTSITLTSNASGTFDGYYYELWKDTGNTTMTVYTQGRF SCQWSNINNALFRTGKKYNQNWQSLGTIRITYSATYNPNGNSYL CIYGWSTNPLVEFYIVESWGNWRPPGATSLGQVTIDGGTYDIYRTTRVNQPCFAKDTKVYYENDTLVHFESIEDMYHKYASLG REVPFDNGYAVPLETVSVYTFEPKTGEVKRTKASYIYREKVEKLAEIRLSNGYLLRITLLHPVLVFRNGLQWVPAGMINPGDL IVGIRSVPANAATIEESEAYFLGLFVAEGTSNPLSITTGSEELKDFIVSFIEDHDGYTPTVEVRRGLYRILFRKKTAEWLGEL ATSNASTKWPERVLNAGESAIAAFLAGYLDGDGYLTESIVELVTKSRELADGLVFLLKRLGIAPRISQKTIEGSVYYRIYIT GEDRKTFEKVLEKSRIKPGEMNEGGVGRYPPALGKFLGKLYSEFRLPKRDNETAYHILTRSRNWFTEKTLSRIEEYFREALE KLSEAGKALEMGDKPELPFPÍÀJTAITKYGFTDRQVANYRTRGLPKRPELKEKWSALLKEIERLEGVANLTVLETIELARRLEFH EVS VEWDYNDWVYDLVIPETHNFIAPNGLVLHNSIVGTATFDQYWSVRT S S KRTSGTVTVTDHFRAWANRGLNLGTIDQITL CVEGYQSSGSANITQNTFSQGSSSGSSGGSSGSTTTTRIECENMSLSGPYVSRITNPFNGIALYANGDTARATVNFPASRNYN FRLRGCGNNNNLTkRVDLRIDGRTVGTFYYQGTYPWEAPIDNVYVSAGSHTVEITVTADNGTWDVYADYLVIQ (SEQ ID NO: 3316), 146-5 AS (int 3)
MQTSITLTSNASGTFDGYYYELWKDTGNTTMTVYTQGRFSCQWSNINNALFRTGKKYNQNWQSLGTIRITYSATYNPNGNSYL CIYGWSTNPLVEFYIVESWGNWRPPGATSLGQVTIDGGTYDIYRTTRVNQPCFAKDTKVYYENDTLVHFESIEDMYHKYASLG REVPFDNGYAVPLETVSVYTFDPKTGEVKRTKASYIYREKVEKLAEIRLSNGYLLRITLLHPVEVFRNGLQWVPAGMIMPGDL IVGIRSVPANAATIEESEAYFLGLFVAEGTSNPLSITTGSEELKDFIVSFIEDHDGYTPTVEVRRGLYRILFRKKTAEWLGEL ATSNASTKWPERVLNAGESAIAAFLAGYLDGDGYLTESIVELVTKSRELADGLVFLLKRLGITPRISQKTIEGSVYYRIYIT GEVRKTFEKVLEKSRIKPGEMNEGGVGRYPPALGKFLGKLYSEFRLPKRDNETAYHILTRSRNVWFTEKTLSRIEEYFREALE KLSEARKALEMGDKPELPFPWTAITKYGFTDRQVANYRTRGLPKRPELKEKWSALLKEIERL.EGVAKLALETIEL.ARRLEFH EVSZVEWDYNDWVYDLVIPETHNFIAPNGLVLHNSIVGTATFDQYWSVRTSKRTSGTVTVTDHFRAWANRGLNLGTIDQITL CVEGYQSSGSANITQNTFSQGSSSGSSGGSSGSTTTTRIECENMSLSGPYVSRITNPFNGIALYANGDTARATVNFPASRNYN FRLRGCGNNNNLARVDLRIDGRTVGTFYYQGTYPWEAPIDNVYVSAGSHTVEITVTADNGTWDVYADYLVIQ (SEQ ID NO: 3317) 146 AS-9- (1 ext, int 4)
MQTSITLTSNASGTFDGYYYELWKDTGNTTMTVYTQGRFSCQWSNINNALFRTGKKYNQNWQSLGTIRITYSATYNPNGNSYL CIYGWSTNPLVEFYIVESWGNWRPPGATSLGQVTIDGGTYDIYRTTRVNQPCFAKDTKVYYENDTLVHFESIEDMYHKYASLG REVPFDNGYAVPLETVSVYTFDPKTGEVKRTKASYIYREKVEKLAEIRLSNGYLLRITLLHPVLLFRNGLQWVPAGMIKPGDL IVGIRSVPANAATIEESEAYFLGLFVAEGTSNSLSITTGSEELKDFIVSFIEDHDGYTPTVEVRRGLYRILFRKKTAEWLGEL ATSNASTKWPERVLNAGESAIAAFLAGYLDGDGYLTESIVELVTKSRELADGLVFLLKRLGITPRISQMTIEGSVYYRIYIT GEDRKTFEKVLEKSRIKPGEMNEGGVGRYPPALGKFLGKLYSEFRLPKRDNETAYHILTRSRNVWFTEKTLSRIEEYFREALE KLSEARKALEMGDKPELPFPWTAITKYGFTDRQVANYRTRGLPKRPPLKEKWSALLKEIERLEGVAKLALETIELjkRRLEFH EVS EVS WDYNDWVYDLVIPETHNFIAPNGLVLHNSNVGTATFDQYWSVRTSKRTSGTVTVTDHFRAWANRGLNLGTIDQITL CVEGYQSSGSANITQNTFSQGSSSGSSGGSSGSTTTTRIECENMSLSGPYVSRITNPFNGIALYANGDTARATVNFPASRNYN FRLRGCGNNNNL7VRVDLRIDGRTVGTFYYQGTYPWEAPIDNVYVSAGSHTVEITVTADNGTWDVYADYLVIQ (SEQ ID NO: 3318) AS-146-11 (2 ext, 1 int)
MQTSITLTSNASGTFDGYYYELWKDTGNTTMTVYTQGRFSCQWSNINNALFRTGKKYNQNWQSLGTIRITYSATYNPNGNSYL CIYGWSTNPLVEFYIVESWGNWRPPGATSLGQVTIDGGTYDIYRTTRVNQSCFAKDTKVYYENDTLVHFESIEDMYHKYASLG REVPFDNGYAVPLETVSVYTFDPKTGEVKRTKASYIYREKVEKLAEIRLSNGYLLRITLLHPVLVFRNGLQWVPAGMIKPGDL IVGIRSVPANAATIEESEAYFLGLFVAEGTSNPLSITTGSEELKDFIVSFIEDHDGYTPTVEVRRGLYRILFRKKTAEWLGEL ATSNASTKWPERVLNAGESAIAAFLAGYLDGDGYLTESIVELVTKSRELADGLVFLLKRLGITPRISQKTIKGSVYYRIYIT GEDRKTFEKVLEKSRIKPGEMNEGGVGRYPPALGKFLGKLYSEFRLPKRDNETAYHILTRSRNVWFTEKTLSRIEEYFREALE KLSEARKALEMGDKPELPFPWTAITKYGFTDRQVANYRTRGLPKRPELKEKWSALLKEIERLEGVAKLALETIELARRLEFH EVSSVEWDYNDWVYDLVIPETHNFIAPNGLVLHNSIVGTATFDQYWSVRTSKRTSGTVTVTDHFRAWANRGLNLGTIDQITL CVEGYQSSGSANITQNTFSQGSSSGSSGGSSGSTTTTRIECENMSLSGPYVSRITNPFNGIALYANGDTARATVNFPASRNYY FRLRGCGNNNNLARVDLRIDGRTVGTFYYQGTYPWEAPIDNVYVSAGSHTVEITVTADNGTWDVYADYLVIQ (SEQ ID NO: 3319) 146 AS-12 (ext 1, 2 int)
MQTSITLTSNASGTFDGYYYELWKDTGNTTMTVYTQGRFSCQWSNINNALFRTGKKYNQNWQSLGTIRITYSATYNPNGNSYL CIYGWSTNPLVEFYIVESWGNWRPPGATSLGQVTIDGGTYDIYRTTRVNQSCFAKDTKVYYENDTLVHFESIEDMYHKYASLG REVPFDNGYAVPLETVSVYTFDPKTGEVKRTKASYIYREKVEKLAEIRLSNGYLLRITLLHPVLVFRNGLQWVPAGMIKPGDL IVGIRSVPANAATIEESEAYFLGLFVAEGTSNPLSITTGSEELKDFIVSFIEDHDGYTPTVEVRRGLYRILFRKKTAEWLGEL ATSNASTKWPEMVLNAGESAIAAFLAGYLDGDGYLTESIVELVTKSRELADGLVFLLKRLGITPRISQKTIEGSVYYRIYIT GEDRKTFEKVLEKSRIKPGEMNEGGVGRYPPALGKFLGKLYSEFRLPKRDNETAYHILTRSRNVWFTEKTLSRIEEYFREALE KLSEARKALEMGDKPELPFPWTAITKYGFTDRQVANYRTRGLPKRPELKEKWSALLKEIERLEGVAKLALETIELARRLEFH EVSSVEVDDYNDWVYDLVIPETHNFIAPNGLVLHNSIVGTATFDQYWSVRTSKRTSGTVTVTDHFRAWANRGLNLGTIDQITL C VEGYQS SGSANITQNTF SQGs S SGS SGGSSGSTTTTRIECENMSLSGPYVSRITNPFNGIALYANGDTARATVNFPASRNYN FRLRGCGNNNNLARVDLRIDGRTVGTFYYQGTYPWEAPIDNVYVSAGSHTVEITVTADNGTWDVYADYLVIQ (SEQ ID NO: 3320) AS-146-13 (1 EXT, INT 3)
MQTSITLTSNASGTFDGYYYELWKDTGNTIMTVYTQGRFSCQWSNINNALFRTGKKYNQNWQSLGTIRITYSATYNPNGNSYL CIYGWSTNPLVEFYIVESWGNWRPPGATSLGQVTIDGGTYDIYRTTRVNQPCFAKDTKVYYENDTLVHFESIEDMYHKYASLG REVPFDNGYAVPLETVSVYTFDPKTGEVKRTKASYIYREKVEKLAEIRLSNGYLLRITLLHPVLVFRNGLQWVPAGMIKPGDL IVGIRSVPANAATIEESEAYFLGLFVAEGTSNPLSITTGSEELKDFIVSFIEDHDGYTPTVEVRRGLYRILFRKKTAEWLGGL ATSNASTKWPERVLNAGESAIAAFLAGYLDGDEYLTESIVELVTKSRELADGLVFLLKRLGITPRISQKTIEGSVYYRIYIT GEDRKTFEKVLEKSRIKPGEMNEGGVGRYPPALGKFLGKLYSEFRLPKRDNETAYHILTRSRNVWFTEKTLSRIEEYFREALE KLSEARKALEMGDKPELPFPWTAITKYGFTDRQVANYRTRGLPKRPELKEKWSALLKEIERLEGVAKLALETIELARRMEFH EVS s VEWDYNDWVYDLVI PETHNF IAPNGLVLHNSTVGTATFDQYWS VRTSKRTSGTVTVTDHFRAWANRGLNLGT DQ I I TL CVEGYQSSGSANITQNTFSQGSSSGSSGGSSGSTTTTRIECENMSLSGPYVSRITNPFNGIALYANGDTARATVNFPASRNYN FRLRGCGNNNNLARVDLRIDGRTVGTFYYQGTYPWEAPIDNVYVSAGSHTVEITVTADNGTWDVYADYLVIQ (SEQ ID NO: 3321) AS-146 -16 (1 ext, int 2)
MQTSITLTSNASGTFDGYYYELWKDTGNTTMTVYTQGRFSCQWSNINNALFRTGKKYNQNWQSLGTIRITYSATYNPNGNSYL CIYGWSTNPLVEFYIVESWGNWRPPGATSLGQVTIDGGTYDIYRTTRVNQPCFAKDTKVYYENDTLVHFESIEDMYHKYASLG REVPFDNGYAVPLETVSVYTFDLKTGEVKRTKASYIYREKVEKLAEIRLSNGYLLRITLLHPVLVFRNGLQWVPAAMIKPGDL IVGIRSVPANAATIEESEAYFLGLFVAEGTSNPLSITTGSEELKDFIVSFIEDHDGYTPTVEVRRGLYRILFRKKTAEWLGEL ATSNASTKWPERVLNAGESAIAAFLAGYLDGDGYLTESIVELVTKSRELADGLVFLLKRLGITPRISQKTIEGSVYYRIYIT GEDRKTFEKVLEKSRIKPGEMNEGGVGRYPPALGKFLGKLYSEFRLPKRDNETAYHILTRSRNVWFTEKTLSRIEEYFREALE KLSEARKALEMGDKPELPFPWTAITKYGFTDRQVANYRTRGLPKRPELKEKWSALL.KEIERLEGVAKLAL.ETIELARRLEFH EVSSVEWDYNDWVYDLVIPETHNFIAPNGLVLHNSIVGTATFDQYWSVRTSKRTSGTVTVTDHFRAWANRGLNLGTIDQITL CVEGYQSSGSANITQNTFSQGSSSGSSGGSSGSTTTTRIECENMSLSGPYVSRITNPFNGIALYADGDTARATVNFPASRNYN FRLRGCGNNNNLARVDLRIDGRTVGTFYYQGTYPWEAPIDNVYVSAGSHTVEITVTADNGTWDVYADYLVIQ (SEQ ID NO: 3322) AS-146 (P77853_Tko_RadA_intein_S158) atgcaaacaagcattactctgacatccaacgcatccggtacgtttgacggttactattacgaactctggaaggatactggcaa tacaacaatgacggtctacactcaaggtcgctt ttcctgccagtggtcgaacatcaataacgcgttgtttaggaccgggaaga aatacaaccagaattggcagtctcttggcacaatccggatcacgtactctgcgacttacaacccaaacgggaactcctacttg tgtatctatggctggtctaccaacccattggtcgagttctacatcgttgagtcctgggggaactggagaccgcctggtgccac gtccctgggccaagtgacaatcgatggcgggacctacgacatctataggacgacacgcgtcaaccagcctTGCTTCGCTAAGG ACACTAAGGTCTACTACGAGAATGACACACTGGTTCATTTCGAGTCAATTGAGGACATGTACCATAAGTACGCTTCTCTCGGG AGGGAGGTGCCATTCGACAACGGCTACGCTGTCCCACTGGAGACCGTGTCAGTCTACACGTTCGATCCGAAGACAGGCGAGGT TAAGAGGACGAAGGCTAGCTACATCTACCGGGAGAAGGTGGAGAAGCTCGCCGAGATCCGCCTGTCGAACGGCTACCTCCTGA GGATTACACTCCTGCACCCCGTTCTCGTGTTCCGGAATGGCCTGCAGTGGGTGCCAGCTGGCATGATCAAGCCTGGGGACCTC ATCGTCGGCATTCGCTCGGTTCCAGCGAACGCCGCGACTATTGAGGAGTCTGAGGCCTACTTCCTCGGGCTGTTCGTGGCTGA GGGCACCTCAAATCCTCTCTCCATCACCACGGGCTCCGAGGAGCTGAAGGACTTCATCGTCAGCTTCATTGAGGACCATGATG GGTACACACCTkACTGTCGAGGTTCGCAGGGGCCTCTACCGGATCCTGTTCCGCAAGAAGACGGCTGAGTGGCTCGGCGAGCTG GCTACTTCGAACGCCTCTACCAAGGTGGTCCCTGAGAGGGTCCTCAATGCGGGGGAGTCCGCTATCGCTGCCTTCCTCGCTGG CTACCTGGACGGCGATGGGTACC TCACTGAGTCTATTGTGGAGCTGGTCACCAAGTCACGGGAGCTCGCTGACGGGCTGGTGT TCCTCCTGAAGCGCCTGGGCATCACGCCGAGGATTAGCCAGAAGACAATCGAGGGGTCGGTCTACTACCGGATCTACATTACG GGCGAGGATCGCAAGACATTCGAGAAGGTCCTGGAGAAGTCCAGGATCAAGCCAGGGGAGATGAACGAGGGCGGGGTTGGCAG GTACCCACCAGCTCTGGGCAAGTTCCTCGGGAAGCTGTACAGCGAGTTCAGGCTCCCCAAGCGGGACAACGAGACTGCGTACC ACATCCTGACCAGGTCACGGAATGTGTGGTTCACCGAGAAGACGCTCTCCCGGATTGAGGAGTACTTCAGGGAGGCTCTGGAG AAGCTGTCGGAGGCTAGGAAGGCTCTGGAGATGGGCGACAAGCCGGAGCTGCCATTCCCTTGGACAGCGATCACTAAGTACGG GTTCACGGATCGCCAGGTCGCTAACTACAGGACAAGGGGCCTCCCAAAGAGGCCAGAGCTGAAGGAGAAGGTTGTGTCCGCCC TCCTGAAGGAGATCGAGAGGCTGGAGGGCGTGGCTAAGCTCGCTCTGGAGACCATTGAGCTCGCTAGGCGCCTGGAGTTCCAT GAGGTTTCCAGCGTGGAGGTCGTTGACTACAATGATTGGGTCTACGATCTCGTCATTCCAGAGACTCATAACTTCATTGCTCC AAATGGGCTCGTGCTCCACAACtccattgtggggacagccacgttcgatcagtactggagcgtgcgcacctctaagcggactt caggaacagtgaccgtgaccgatcacttccgcgcctgggcgaaccggggcctgaacctcggcacaatagaccaaattacattg tgcgtggagggttaccaaagctctggatcagccaacatcacccagaacaccttctctcagggctcttcttccggcagttcggg tggctcatccggctc cacaacgactactcgcatcgagtgtgagaacatgtccttgtccggaccctacgttagcaggatcacca atccctttaatggtattgcgctgtacgccaacggagacacagcccgcgctaccgttaacttccccgcaagtcgcaactacaat ttccgcctgcggggttgcggcaacaacaataatcttgcccgtgtggacctgaggatcgacggacggaccgtcgggacctttta ttaccagggcacatacccctgggaggccccaattgacaatgtttatgtcagtgcggggagtcatacagtcgaaatcactgtta ctgcggataacggcacatgggacgtgtatgccgactacctggtgatacagtga <SEQ ID NO: 2832), 146-2 AS (2 ext 3 int)
ATGCAAACAAGCATTACTCTGACATCCAACGCATCCGGTACGTTTGACGGTTACTATTACGAGCTCTGGAAGGATACTGGCAA TACAACAATGACGGTCTACACTCAAGGTCGCTTTTCCTGCCAGTGGTCGAACATCAATAACGCGTTGTTTAGGACCGGGAAGA AATACAACCAGAATTGGCAGTCTCTTGGCACAATCCGGATCACGTACTCTGCGACTTACAACCCAAACGGGAACTCCTACTTG TGTATCTATGGCTGGTCTACCAACCCATTGGTCGAGTTCTACATCGTTGAGTCCTGGGGGAACTGGAGACCGCCTGGTGCCAC GTCCCTGGGCCAAGTGACZiATCGATGGCGGGACCTACGACATCTATAGGACGACACGCGTCAACCAGCCTTGCTTCGCTAAGG ACACTAAGGTCTACTACGAGZLATGACACACTGGTTCATTTCGAGTCAATTGAGGACATGTACCATAAGTACGCTTCTCTCGGG AGGGAGGTGCCATTCGACAACGGCTACGCTGTCCCACTGGAGACCGTGTCAGTCTACACGTTCGATCCGAAGACAGGCGAGGT TAAGAGGACGAAGGCTAGCTACATCTACCGGGAGAAGGTGGAGAAGCTCGCCGAGATCCGCCTGTCGAACGGCTACCTCCTGA GGATTACACTCCTGCACCCCGTTCTCGTGTTCCGGAATGGCCTGCAGTGGGTGCCAGCTGGCATGATCAAGCCTGGGGACCTC ATCGTCGGCATTCGCTCGGTTCCAGCGAACGCCGCGACTATTGAGGAGTCTGAGGCCTACTTCCTCGGGCTGTTCGTGGCTGA GGGCACCTCAAATCCTCTCTCCATCACCACGGGCTCCGAGGAGCTGAAGGACTTCATCGTCAGCTTCATTGAGGACCATGATG GGTACACACCAACTGTCGAGGTTCGCAGGGGCCTCTACCGGATCCTGTTCCACAAGAAGACGGCTGAGTGGCT CGGCGAGCTG GCTACTTCGAACGCCTCTACCAAGGTGGTCCCTGAGAGGGTCCTCAATGCGGGGGAGTCCGCTATCGCTGCCTTCCTCGCTGG CTACCTGGACGGCGATGGGTACCTCACTGAGTCTATTGTGGAGCTGGTCACCAAGTCACGGGAGCTCGCTGACGGGCTGGTGT TCCTCCTGAAGCGCCTGGGCATCACGCCGTGGATTAGCCAGAAGACAATCGAGGGGTCGGTCTACTACCGGGTCTACATTATG GGCGAGGATCGCAAGACATTCGAGAAGGTCCTGGAGAAGTCCAGGATCAAGCCAGGGGAGATGAACGAGGGCGGGGTTGGCAG GTACCCACCAGCTCTGGGCAAGTTCCTCGGGAAGCTGTACAGCGAGTTCAGGCTCCCCAAGCGGGACAACGAGACTGCGTACC ACATCCTGACCAGGTCACGGAATGTGTGGTTCACCGAGAAGACGCTCTCCCGGATTGAGGAGTACTTCAGGGAGGCTCTGGAG AAGCTGTCGGAGGCTAGGAAGGCTCTGGAGATGGGCGACAAGCCGGAGCTGCCATTCCCTTGGACAGCGATCACTAAGTACGG GTTCACGGATCGCCAGGTCGCTAACTACAGGACAAGGGGCCTCCCAAAGAGGCCAGAGCTGAAGGAGAAGGTTGTGTCCGCCC TCCTGAAGGAGATCGAGAGGCTGGAGGGCGTGGCTAAGCTCGCTCTGGAGACCATTGAGCTCGCTAGGCGCCTGGAGTTCCAT GAGGTTTCCAGCGTGGAGGTCGTTGACTACAATGATTGGGTCTACGATCTCGTCATTCCAGAGACTCATAACTTCATTGCTCC AAATGGGCTCGTGCTCCACAACTCCATTGTGGGGACAGCCACGTTCGATCAGTACTGGAGCGTGCGCACCTCTAAGCGGACTT CAGGAACAGTGACCGTGACCGATCACTTCCGCGCCTGGGCGAACCGGGGCCTGAACCTCGGCACA ATAGACCAAATTACATTG TGCGTGGAGGGTTACCAAAGCTCTGGATCAGCCAACATCACCCAGAACACCTTCTCTCAGGGCTCTTCTTCCGGCAGTTCGGG TGGCTCATCCGGCTCCACAACGACTACTCGCATCGAGTGTGAGAACATGTCCTTGTCCGGACCCTACGTTAGCAGGATCACCA ATCCCTTTAATGGTATTGCGCTGTATGCCAACGGAGACACAGCCCGCGCTACCGTTAACTTCCCCGCAAGTCGCAACTACAAT TTCCGCCTGCGGGGTTGCGGCAACAACAATAATCTTGCCCGTGTGGACCTGAGGATCGACGGACGGACCGTCGGGACCTTTTA TTACCAGGGCACATACCCCTGGGAGGCCCCAATTGACAATGTTTATGTCAGTGCGGGGAGTCATACAGTCGAAATCACTGTTA CTGCGGATAACGGCACATGGGACGTGTATGCCGACTACCTGGTGATACAGTGA (SEQ ID NO: 3323), 146-4 AS (int 5) atgcaaacaagcattactctgacatccaacgcatccggtacgtttgacggttactattacgaactctggaaggatactggcaa tacaacaatgacggtctacactcaaggtcgcttttcctgccagtggtcgaacatcaataacgcgttgtttaggaccgggaaga aatacaaccagaattggcagtctcttggcacaatccggatcacgtactctgcgacttacaacccaaacgggaactcctacttg tgtatctatggctggtctaccaacccattggtcgagttctacatcgttgagtcctgggggaactggagaccgcctggtgccac gtccctgggccaagtgacaatcgatggcgggacctacgacatctataggacgacacgcgtcaaccagcctTGCTTCGCTAAGG ACACTAAGGTCTACTACGAGAATGACACACTGGTTCATTTCGAGTCAATTG AGGACATGTACCATAAGTACGCTTCTCTCGGG AGGGAGGTGCCATTCGACAACGGCTACGCTGTCCCACTGGAGACCGTGTCAGTCTACACGTTCGAACCGAAGACAGGCGAGGT TAAGAGGACGAAGGCTAGCTACATCTACCGGGAGAAGGTGGAGAAGCTCGCCGAGATCCGCCTGTCGAACGGCTACCTCCTGA GGATTACACTCCTGCACCCCGTTCTCGTGTTCCGGAATGGCCTGCAGTGGGTGCCAGCTGGCATGATCAATCCTGGGGACCTC ATCGTCGGCATTCGCTCGGTTCCAGCGAACGCCGCGACTATTGAGGAGTCTGAGGCCTACTTCCTCGGGCTGTTCGTGGCTGA GGGCACCTCAAATCCTCTCTCCATCACCACGGGCTCCGAGGAGCTGAAGGACTTCATCGTCAGCTTCATTGAGGACCATGATG GGTACACACCAACTGTCGAGGTTCGCAGGGGCCTCTACCGGATCCTGTTCCGCAAGAAGACGGCTGAGTGGCTCGGCGAGCTG GCTACTTCGAACGCCTCTACCAAGGTGGTCCCTGAGAGGGTCCTCAATGCGGGGGAGTCCGCTATCGCTGCCTTCCTCGCTGG CTACCTGGACGGCGATGGGTACCTCACTGAGTCTATTGTGGAGCTGGTCACCAAGTCACGGGAGCTCGCTGACGGGCTGGTGT TCCTCCTGAAGCGCCTGGGCATCGCGCCGAGGATTAGCCAGAAGACAATCGAGGGGTCGGTCTACTACCGGATCTACATTACG GGCGAGGATCGCAAGACATTCGAGAAGGTCCTGGAGAAGTCCAGGATCAAGCCAGGGGAGATGAACGAGGGCGGGGTTGGCAG GTACCCACCAGCTCTGGGCAAGTTCCTCGGGAAGCTGTACAGCGAGTTCAGGCTCCCCAAGCGGGACAACGAGACTGCGTACC ACATCCTGACCAGGTCACGGAATGTGTGGTTCACCGAGAAGAC GCTCTCCCGGATTGAGGAGTACTTCAGGGAGGCTCTGGAG AAGCTGTCGGAGGCTGGGAAGGCTCTGGAGATGGGCGACAAGCCGGAGCTGCCATTCCCTTGGACAGCGATCACTAAGTACGG GTTCACGGATCGCCAGGTCGCTAACTACAGGACAAGGGGCCTCCCAAAGAGGCCAGAGCTGAAGGAGAAGGTTGTGTCCGCCC TCCTGAAGGAGATCGAGAGGCTGGAGGGCGTGGCTAACCTCGCTCTGGAGACCATTGAGCTCGCTAGGCGCCTGGAGTTCCAT GAGGTTTCCAGCGTGGAGGTCGTTGACTACAATGATTGGGTCTACGATCTCGTCATTCCAGAGACTCATAACTTCATTGCTCC AAATGGGCTCGTGCTCCACAACtccattgtggggacagccacgttcgatcagtactggagcgtgcgcacctctaagcggactt caggaacagtgaccgtgaccgatcacttccgcgcctgggcgaaccggggcctgaacctcggcacaatagaccaaattacattg tgcgtggagggttaccaaagctctggatcagccaacatcacccagaacaccttctctcagggctcttcttccggcagttcggg tggctcatccggctccacaacgactactcgcatcgagtgtgagaacatgtccttgtccggaccctacgttagcaggatcacca atccctttaatggtattgcgctgtacgccaacggagacacagcccgcgctaccgttaacttccccgcaagtcgcaactacaat ttccgcctgcggggttgcggcaacaacaataatcttgcccgtgtggacctgaggatcgacggacggaccgtcgggacctttta ttaccagggcacatacccctgggaggccccaattgacaatgtttatgtcagtgcggggagtcatacagtcgaaatcactgtta ctgcggataacggcacatgggacgtgtatgccgac tacctggtgatacagtga (SEQ ID NO: 3324), 146-5 AS (EXT 1, 4 int) atgcaaacaagcattactctgacatccaacgcatccggtacgtttgacggttactattacgaactctggaaggatactggcaa tacaacaatgacggtctacactcaaggtcgcttttcctgccagtggtcgaacatcaataacgcgttgtttaggaccgggaaga aatacaaccagaattggcagtctcttggcacaatccggatcacgtactctgcgacttacaacccaaacgggaactcctacttg tgtatctatggctggtctaccaacccattggtcgagttctacatcgttgagtcctgggggaactggagaccgcctggtgccac gtccctgggccaagtgacaatcgatggcgggacctacgacatctataggacgacacgcgtcaaccagcctTGCTTCGCTAAGG ACACTAAGGTCTACTACGAGAATGACACACTGGTTCATTTCGAGTCAATTGAGGACATGTACCATAAGTACGCTTCTCTCGGG AGGGAGGTGCCATTCGACAACGGCTACGCTGTCCCACTGGAGACCGTGTCAGTCTACACGTTCGATCCGAAGACAGGCGAGGT TAAGAGGACGAAGGCTAGCTACATCTACCGGGAGAAGGTGGAGAAGCTCGCCGAGATCCGCCTGTCGAACGGCTACCTCCTGA GGATTACACTCCTGCACCCCGTTCTCGTGTTCCGGAATGGCCTGCAGTGGGTACCAGCTGGCATGATCATGCCTGGGGACCTC ATCGTCGGCATTCGCTCGGTTCCAGCGAACGCCGCGACTATTGAGGAGTCTGAGGCCTACTTCCTCGGGCTGTTCGTGGCTGA GGGCACCTCAAATCCTCTCTCCATCACCACGGGCTCCGAGGAGCTGAAGGACTTCATCGTCAGCTTCATTGAGGACCATGATG GGTACACACCAACTG TCGAGGTTCGCAGGGGCCTCTACCGGATCCTGTTCCGCAAGAAGACGGCTGAGTGGCTCGGCGAGCTG GCTACTTCGAACGCCTCTACCAAGGTGGTCCCTGAGAGGGTCCTCAATGCGGGGGAGTCCGCTATCGCTGCCTTCCTCGCTGG CTACCTGGACGGCGATGGGTACCTCACTGAGTCTATTGTGGAGCTGGTCACCAAGTCACGGGAGCTCGCTGACGGGCTGGTGT TCCTCCTGAAGCGCCTGGGCATCACGCCGAGGATTAGCCAGAAGACAATCGAGGGGTCGGTCTACTACCGGATCTACATTACG GGCGAGGTTCGCAAGACATTCGAGAAGGTCCTGGAGAAGTCCAGGATCAAGCCAGGGGAGATGAACGAGGGCGGGGTTGGCAG GTACCCACCAGCTCTGGGCAAGTTCCTCGGGAAGCTGTACAGCGAGTTCAGGCTCCCCAAGCGGGACAACGAGACTGCGTACC ACATCCTGACCAGGTCACGGAATGTGTGGTTCACCGAGAAGACGCTCTCCCGGATTGAGGAGTACTTCAGGGAGGCTCTGGAG TiAGCTGTCGGAGGCTAGGAAGGCTCTGGAGATGGGCGACAAGCCGGAGCTGCCATTCCCTTGGACAGCGATCACTAAGTACGG GTTCACGGATCGCCAGGTCGCTAACTACAGGACAAGGGGCCTCCCAAAGAGGCCAGAGCTGAAGGAGAAGGTTGTGTCCGCCC TCCTGAAGGAGATCGAGAGGCTGGAGGGCGTGGCTAAGCTCGCTCTGGAGACCATTGAGCTCGCTAGGCGCCTGGAGTTCCAT GAGGTTTCCATCGTGGAGGTCGTTGACTACAATGATTGGGTCTACGATCTCGTCATTCCAGAGACTCATAACTTCATTGCTCC AAATGGGCTCGTGCTCCACAACtccattgtggggacagccacgttcgatcagtactggagcgtgcgcacctctaagcggactt cagga acagtgaccgtgaccgatcacttccgcgcctgggcgaaccggggcctgaacctcggcacaatagaccaaattacattg tgcgtggagggttaccaaagctctggatcagccaacatcacccagaacaccttctctcagggctcttcttccggAagttcggg tggcteatccggctccacaacgactactcgcatcgagtgtgagaacatgtccttgtccggaccctacgttagcaggatcacca atccctttaatggtattgcgctgtacgccaacggagacacagcccgcgctaccgttaacttccccgcaagtcgcaactacaat ttccgcctgcggggttgcggcaacaacaataatcttgcccgtgtggacctgaggatcgacggacggaccgtcgggacctttta ttaccagggcacatacccctgggaggccccaattgacaatgtttatgtcagtgcggggagtcatacagtcgaaatcactgtta ctgcggataacggcacatgggacgtgtatgccgactacctggtgatacagtga (SEQ ID NO: 3325), 146-9 AS (ext 3, 5 int) atgcaaacaagcattactctgacatccaacgcatccggtacgtttgacggttactattacgaactctggaaggatactggcaa tacaacaatgacggtctacactcaaggtcgcttttcctgccagtggtcgaacatcaataacgcgttgtttaggaccgggaaga aatacaaccagaattggcagtctcttggTacaatccggatcacgtactctgcgacttacaacccaaacgggaactcctacttg tgtatctatggctggtctaccaacccattggtcgagttctaTatcgttgagtcctgggggaactggagaccgcctggtgccac gtccctgggccaagtgacaatcgatggcgggacctacgacatctataggacgacacgcgtcaaccagcc tTGCTTCGCTAAGG ACACTAAGGTCTACTACGAGAATGACACACTGGTTCATTTCGAGTCAATTGAGGATATGTACCATAAGTACGCTTCTCTCGGG AGGGAGGTGCCATTCGACAACGGCTACGCTGTCCCACTGGAGACCGTGTCAGTCTACACGTTCGATCCGAAGACAGGCGAGGT TAAGAGGACGAAGGCTAGCTACATCTACCGGGAGAAGGTGGAGAAGCTCGCCGAGATCCGCCTGTCGAACGGCTACCTCCTGA GGATTACACTCCTGCACCCCGTTCTCCTGTTCCGGAATGGCCTGCAGTGGGTGCCAGCTGGCATGATCAAGCCTGGGGACCTC ATCGTCGGCATTCGCTCGGTTCCAGCGAACGCCGCGACTATTGAGGAGTCTGAGGCCTACTTCCTCGGGCTGTTCGTGGCTGA GGGCACCTCAAATTCTCTCTCCATCACCACGGGCTCCGAGGAGCTGAAGGACTTCATCGTCAGCTTCATTGAGGACCATGATG GGTACACACCAACTGTCGAGGTTCGCAGGGGCCTCTACCGGATCCTGTTCCGCAAGAAGACGGCTGAGTGGCTCGGCGAGCTG GCTACTTCGAACGCCTCTACCAAGGTGGTCCCTGAGAGGGTCCTCAATGCGGGGGAGTCCGCTATCGCTGCCTTCCTCGCTGG CTACCTGGACGGCGATGGGTACCTCACTGAGTCTATTGTGGAGCTGGTCACCAAGTCACGGGAGCTCGCTGACGGGCTGGTGT TCCTCCTGAAGCGCCTGGGCATCACGCCGAGGATTAGCCAGATGACAATCGAGGGGTCGGTCTACTACCGGATCTACATTACG GGCGAGGATCGCAAGACATTCGAGAAGGTCCTGGAGAAGTCCAGGATCAAGCCAGGGGAGATGAACGAGGGCGGGGTTGGCAG GTACCCACCAGCTCTGGGCAAGTTCCTCGGGAAGCTGTACAGCGAGTTCAGGCTCCCCAA GCGGGACAACGAGACTGCGTACC ACATCCTGACCAGGTCACGGAATGTGTGGTTCACCGAGAAGACGCTCTCCCGGATTGAGGAGTACTTCAGGGAGGCTCTGGAG AAGCTGTCGGAGGCTAGGAAGGCTCTGGAGATGGGCGACAAGCCGGAGCTGCCATTCCCTTGGACAGCGATCACTAAGTACGG GTTCACGGATCGCCAGGTCGCTAACTACAGGACAAGGGGCCTCCCAAAGAGGCCAGACCTGAAGGAGAAGGTTGTGTCCGCCC TCCTGAAGGAGATCGAGAGGCTGGAGGGCGTGGCTAAGCTCGCTCTGGAGACCATTGAGCTCGCTAGGCGCCTGGAGTTCCAT GAGGTTTCCAGCGTGGAGGTCGTTGACTACAATGATTGGGTCTACGATCTCGTCATTCCAGAGACTCATAACTTCATTGCTCC AAATGGGCTCGTGCTCCACAACtccaAtgtggggacagccacgttcgatcagtactggagcgtgcgcacctctaagcggactt caggaacagtgaccgtgaccgatcacttccgcgcctgggcgaaccggggcctgaacctcggcacaatagaccaaattacattg tgcgtggagggttaccaaagctctggatcagccaacatcacccagaacaccttctctcagggctcttcttccggcagttcggg tggctcatccggctccacaacgactactcgcatcgagtgtgagaacatgtccttgtccggaccctacgttagcaggatcacca atccctttaatggtattgcgctgtacgccaacggagacacagcccgcgctaccgttaacttccccgcaagtcgcaactacaat ttccgcctgcggggttgcggcaacaacaataatcttgcccgtgtggacctgaggatcgacggacggaccgtcgggacctttta ttaccagggcacatacccctgggaggccccaattgacaatgtttatgtcagt gcggggagtcatacagtcgaaatcactgtta ctgcggataacggcacatgggacgtgtatgccgactacctggtgatacagtga (SEQ ID NO: 3326) AS-146-11 (2 ext, 1 int) atgcaaacaagcattactctgacatccaacgcatccggtacgtttgacggttactattacgaactctggaaggatactggcaa tacaacaatgacggtctacactcaaggtcgcttttcctgccagtggtcgaacatcaataacgcgttgtttaggaccgggaaga aatacaaccagaattggcagtctcttggcacaatccggatcacgtactctgcgacttacaacccaaacgggaactcctacttg tgtatctatggctggtctaccaacccattggtcgagttctacatcgttgagtcctgggggaactggagaccgcctggtgccac gtccctgggccaagtgacaatcgatggcgggacctacgacatctataggacgacacgcgtcaaccagTctTGCTTCGCTAAGG ACACTAAGGTCTACTACGAGAATGACACACTGGTTCATTTCGAGTCAATTGAGGACATGTACCATT ^ AGTACGCTTCTCTCGGG AGGGAGGTGCCATTCGACAACGGCTACGCTGTCCCACTGGAGACCGTGTCAGTCTACACGTTCGATCCGAAGACAGGCGAGGT TAAGAGGACGAAGGCTAGCTACATCTACCGGGAGAAGGTGGAGAAGCTCGCCGAGATCCGCCTGTCGAACGGCTACCTCCTGA GGATTACACTCCTGCACCCCGTTCTCGTGTTCCGGAATGGCCTGCAGTGGGTGCCAGCTGGCATGATCAAGCCTGGGGACCTC ATCGTCGGCATTCGCTCGGTTCCAGCGAACGCCGCGACTATTGAGGAGTCTGAGGCCTACTTCCTCGGGCTGTTCGTGGCTGA GGGCACCTCAAATCCTCTCTCCATCACCA CGGGCTCCGAGGAGCTGAAGGACTTCATCGTCAGCTTCATTGAGGACCATGATG GGTACACACCAACTGTCGAGGTTCGCAGGGGCCTCTACCGGATCCTGTTCCGCAAGAAGACGGCTGAGTGGCTCGGCGAGCTG GCTACTTCGAACGCCTCTACCAAGGTGGTCCCTGAGAGGGTCCTCAATGCGGGGGAGTCCGCTATCGCTGCCTTCCTCGCTGG CTACCTGGACGGCGATGGGTACCTCACTGAGTCTATTGTGGAGCTGGTCACCAAGTCACGGGAGCTCGCTGACGGGCTGGTGT TCCTCCTGAAGCGCCTGGGCATCACGCCGAGGATTAGCCAGAAGACAATCAAGGGGTCGGTCTACTACCGGATCTACATTACG GGCGAGGATCGCAAGACATTCGAGAAGGTCCTGGAGAAGTCCAGGATCAAGCCAGGGGAGATGAACGAGGGCGGGGTTGGCAG GTACCCACCAGCTCTGGGCAAGTTCCTCGGGAAGCTGTACAGCGAGTTCAGGCTCCCCAAGCGGGACAACGAGACTGCGTACC ACATCCTGACCAGGTCACGGAATGTGTGGTTCACCGAGAAGACGCTCTCCCGGATTGAGGAGTACTTCAGGGAGGCTCTGGAG AAGCTGTCGGAGGCTAGGAAGGCTCTGGAGATGGGCGACAAGCCGGAGCTGCCATTCCCTTGGACAGCGATCACTAAGTACGG GTTCACGGATCGCCAGGTCGCTAACTACAGGACAAGGGGCCTCCCAAAGAGGCCAGAGCTGAAGGAGAAGGTTGTGTCCGCCC TCCTGAAGGAGATCGAGAGGCTGGAGGGCGTGGCTAAGCTCGCTCTGGAGACCATTGAGCTCGCTAGGCGCCTGGAGTTCCAT GAGGTTTCCAGCGTGGAGGTCGTTGACTACAATGATTGGGTCTACGATCTCGTCATTCCAGAGACTCATAACTTCATTGCTCC AAATGGGCTCGTGCTCCACAA Ctccattgtggggacagccacgttcgatcagtactggagcgtgcgcacctctaagcggactt caggaacagtgaccgtgaccgatcacttccgcgcctgggcgaaccggggcctgaacctcggcacaatagaccaaattacattg tgcgtggagggttaccaaagctctggatcagccaacatcacccagaacaccttctctcagggctcttcttccggcagttcggg tggctcatccggctccacaacgactactcgcatcgagtgtgagaacatgtccttgtccggaccctacgttagcaggatcaeca atccctttaatggtattgcgctgtacgccaacggagacacagcccgcgctaccgttaacttccccgcaagtcgcaactacTat ttccgcctgcggggttgcggcaacaacaataatcttgcccgtgtggacctgaggatcgacggacggaccgtcgggacctttta ttaccagggcacatacccctgggaggccccaattgacaatgtttatgtcagtgcggggagtcatacagtcgaaatcactgtta ctgcggataacggcacatgggacgtgtatgccgactacctggtgatacagtga (SEQ ID NO: 3327) AS-146-12 (1 ext, int 6) atgcaaacaagcattactctgacatccaacgcatccggtacgtttgacggttactattacgaactctggaaggatactggcaa tacaacaatgacggtctacactcaaggtcgcttttcctgccagtggtcgaacatcaataacgcgttgtttaggaccgggaaga aatacaaccagaattggcagtctcttggcacaatccggatcacgtactctgcgacttacaacccaaacgggaactcctacttg tgtatctatggctggtctaccaacccattggtcgagttctacatcgttgagtcctgggggaactggagaccgcctggtgccac gtccctgggccaagtgacaatcgatggcqqgacctacqacatctataggacgacacgcgtcaaccagTctTGCTTCGCTAAGG ACACTAAGGTCTACTACGAGAATGACACACTGGTTCATTTCGAGTCAATTGAGGACATGTACCATAAGTACGCTTCTCTCGGG AGGGAGGTGCCATTCGACAACGGCTACGCTGTCCCACTGGAGACCGTGTCAGTCTACACGTTCGATCCGAAGACAGGCGAGGT TAAGAGGACGAAGGCTAGCTACATCTACCGGGAAAAGGTGGAGAAGCTCGCCGAGATCCGCCTGTCGAACGGCTACCTCCTGA GGATTACACTCCTGCACCCCGTTCTCGTGTTCCGGAATGGCCTGCAGTGGGTGCCAGCTGGCATGATCAAGCCTGGGGACCTC ATCGTCGGCATTCGCTCGGTTCCAGCGAACGCCGCGACTATTGAGGAGTCTGAGGCCTACTTCCTCGGGCTGTTCGTGGCTGA GGGCACCTCAAATCCTCTCTCCATCACCACGGGCTCCGAGGAGCTGAAGGACTTCATCGTCAGCTTCATTGAGGACCATGATG GGTACACACCAACTGTCGAGGTTCGCAGGGGCCTCTACCGGATCCTGTTCCGCAAGAAGACGGCTGAGTGGCTCGGCGAGCTG GCTACTTCGAACGCCTCTACCAAGGTGGTCCCTGAGATGGTCCTCAATGCGGGGGAGTCCGCTATCGCTGCCTTCCTCGCTGG CTACCTGGACGGCGATGGGTACCTCACTGAGTCTATTGTGGAGCTGGTCACCAAGTCACGGGAGCTCGCTGACGGGCTGGTGT TCCTCCTGAAGCGCCTGGGCATCACGCCGAGGATTAGCCAGAAGACAATCGAGGGGTCGGTCTACTACCGAATCTACATTACG GGCGAGGATCGCAAGACATTCGAGAAAGTCCTGGAGAAGTCCAGGATCAAGCCAGGGGAGATGAACGAGGGCGGG GTTGGCAG GTACCCACCAGCTCTGGGCAAGTTCCTCGGGAAGCTGTACAGCGAGTTCAGGCTCCCCAAGCGGGACAACGAGACTGCGTACC ACATCCTGACCAGGTCACGGAATGTGTGGTTCACCGAGAAGACGCTCTCCCGGATTGAGGAGTACTTCAGGGAGGCTCTGGAG AAGCTGTCGGAGGCTAGGAAGGCTCTGGAGATGGGCGACAAGCCGGAGCTGCCATTCCCTTGGACAGCGATCACTAAGTACGG GTTCACGGATCGCCAGGTCGCTAACTACAGGACAAGGGGCCTCCCAAAGAGGCCAGAGCTGAAGGAGAAGGTTGTGTCCGCCC TCCTGAAGGAGATCGAGAGGCTGGAGGGCGTGGCTAAGCTCGCTCTGGAGACCATTGAGCTCGCTAGGCGCCTGGAGTTCCAT GAGGTTTCCAGCGTGGAGGTCGATGACTACAATGATTGGGTCTACGATCTCGTCATTCCAGAGACACATAACTTCATTGCTCC AAATGGGCTCGTGCTCCACAACtccattgtggggacagccacgttcgatcagtactggagcgtgcgcacctctaagcggactt caggaacagtgaccgtgaccgatcacttccgcgcctgggcgaaccggggcctgaacctcggcacaatagaccaaattacattg tgcgtggagggttaccaaagctctggatcagccaacatcacccagaacaccttctctcagggctcttcttccggcagttcggg tggcteatccggctccacaacgactactcgcatcgagtgtgagaacatgtccttgtccggaccctacgttagcaggatcacca atccctttaatggtattgcgctgtacgccaacggagacacagcccgcgctaccgttaacttccccgcaagtcgcaactacaat ttccgcctgcggggttgcggcaacaacaataatcttgcccgtgtggacctgaggatcgacggacgga ccgtcgggacctttta ttaccagggcacatacccctgggaggccccaattgacaatgtttatgtcagtgcggggagtcatacagtcgaaatcactgtta ctgcggataacggcacatgggacgtgtatgccgactacctggtgatacagtga (SEQ ID NO: 3328) AS-146-13 (ext 5, 4 int) atgcaaacaagcattactctgacatccaacgcatccggtacgtttgacggttactattacgaactctggaaggatactggcaa tacaaTaatgacggtctacactcaaggtcgcttttcctgccagtggtcgaacatcaataacgcgttgtttaggaccgggaaga aatacaaccagaattggcagtctcttggcacaatccggatcacgtactctgcgacttacaacccaaacgggaactcctacttg tgtatctatggctggtctaccaacccattggtcgagttctacatcgttgagtcctgggggaactggagaccgcctggtgccac gtccctgggccaagtgacaatcgatggcgggacctacgacatctataggacgacacgcgtcaaccagcctTGCTTCGCTAAGG ACACTAAGGTCTACTACGAGAATGACACACTGGTTCATTTCGAGTCAATTGAGGACATGTACCATAAGTACGCTTCTCTCGGG AGGGAGGTGCCATTCGACAACGGCTACGCTGTCCCACTGGAGACCGTGTCAGTCTACACGTTCGATCCGAAGACAGGCGAGGT TAAGAGGACGAAGGCTAGCTACATCTACCGGGAGAAGGTGGAGAAGCTCGCCGAGATCCGCCTGTCGAACGGCTACCTCCTGA GGATTACACTCCTGCACCCCGTTCTCGTGTTCCGGAATGGCCTGCAGTGGGTGCCAGCTGGCATGATCAAGCCTGGGGACCTC ATCGTCGGCATTCGCTCGGTTCCAGCGAACGCCGCGACTATTGAGG AGTCTGAGGCCTACTTCCTCGGGCTGTTCGTGGCTGA GGGCACCTCAAATCCTCTCTCCATCACCACGGGCTCCGAGGAGCTGAAGGACTTCATCGTCAGCTTCATTGAGGACCATGATG GGTACACACCAACTGTCGAGGTTCGCAGGGGCCTCTACCGGATCCTGTTCCGCAAGAAGACGGCTGAGTGGCTCGGCGGGCTG GCTACTTCGAACGCCTCTACCAAGGTGGTCCCTGAGAGGGTCCTCAATGCGGGGGAGTCCGCTATCGCTGCCTTCCTCGCTGG CTACCTGGACGGCGATGAGTACCTCACTGAGTCTATTGTGGAGCTGGTCACCAAGTCACGGGAGCTCGCTGACGGGCTGGTGT TCCTCCTGAAGCGCCTGGGCATCACGCCGAGGATTAGCCAGAAGACAATCGAGGGGTCGGTCTACTACCGGATCTACATTACG GGCGAGGATCGCAAGACATTCGAGAAGGTCCTGGAGAAGTCCAGGATCAAGCCAGGGGAGATGAACGAGGGCGGGGTTGGCAG GTACCCACCAGCTCTGGGCAAGTTCCTCGGGAAGCTGTACAGCGAGTTCAGGCTCCCCAAGCGGGATAACGAGACTGCGTACC ACATCCTGACCAGGTCACGGAATGTGTGGTTCACCGAGAAGACGCTCTCCCGGATTGAGGAGTACTTCAGGGAGGCTCTGGAG AAGCTGTCGGAGGCTAGGAAGGCTCTGGAGATGGGCGAC / LAGCCGGAGCTGCCATTCCCTTGGACAGCGATCACTAAGTACGG GTTCACGGATCGCCAGGTCGCTAACTACAGGACAAGGGGCCTCCCAAAGAGGCCAGAGCTGAAGGAGAAGGTTGTGTCCGCCC TCCTGAAGGAGATCGAGAGGCTGGAGGGCGTGGCTAAGCTCGCTCTGGAGACCATTGAGCTCGCTAGGCGCATGGAGTTCCAT GAGGTTTCCAGCGTGGAGGTCGTTGACTACJkATGAT TGGGTCTACGATCTCGTCATTCCAGAGACTCATTkACTTCATTGCTCC AAATGGGCTCGTGCTCCACAACtccaCtgtggggacagccacgttcgatcagtactggagcgtgcgcacctctaagcggacAt caggaacagtgaccgtgaccgatcacttccgcgcctgggcgaaccggggcctgaacctcggcacaatagaccaaattacattg tgcgtggagggttaccaaagctctggatcagccaacatcacccagaacaccttctctcagggctcttcttccggcagttcggg tggctcatccggctccacaacgactactcgcatcgagtgtgagaacatgtccttgtccggaccctacgttagcaggatcaeca atccctttaatggtattgcgctgtacgccaacggagacacagcccgcgctaccgttaacttccccgcaagtcgcaactacaat ttccgcctgcggggttgcggcaacaacaataatcttgcccgtgtggacctgaggatcgacggacggaccgtcgggaccttCta ttaccagggcacatacccctgggaggccccaattgacaatgtttatgtcagtgcAgggagtcatacagtcgaaatcactgtta ctgcggataacggcacatgggacgtgtatgccgactacctggtgatacagtga (SEQ ID NO: 3329) AS-146-16 (ext 3, 2 int) atgcaaacaagcattactctgacatccaacgcatccggtacgtttgacggttactattacgaactctggaaggatactggcaa tacaacaatgacggtctacactcaaggtcgcttttcctgccagtggtcgaacatcaataacgcgttgtttaggaccgggaaga aatacaaccagaattggcagtctcttggcacaatccggatcacgtactctgcgacttacaacccaaacgggaactcctacttg tgtatctatggctg gtctaccaacccattggtcgagttctacatcgtAgagtcctgggggaactggagaccgcctggtgccac gtccctgggccaagtgacaatcgatggcgggacctacgacatctataggacgacacgcgtcaaccagcctTGCTTCGCTAAGG ACACTAAGGTCTACTACGAGAATGACACACTGGTTCATTTCGAGTCAATTGAGGACATGTACCATAAGTACGCTTCTCTCGGG AGGGAGGTGCCATTCGACAACGGCTACGCTGTCCCACTGGAGACCGTGTCAGTCTACACGTTCGATCTGAAGACAGGCGAGGT TAAGAGGACGAAGGCTAGCTACATCTACCGGGAGAAGGTGGAGAAGCTCGCCGAGATCCGCCTGTCGAACGGCTACCTCCTGA GGATTACACTCCTGCACCCCGTTCTCGTGTTCCGGAATGGCCTGCAGTGGGTGCCAGCTGCCATGATCAAGCCTGGGGACCTC ATCGTCGGCATTCGCTCGGTTCCAGCGAACGCCGCGACTATTGAGGAGTCTGAGGCCTACTTCCTCGGGCTGTTCGTGGCTGA GGGCACCTCAAATCCTCTCTCCATCACCACGGGCTCCGAGGAGCTGAAGGACTTCATCGTCAGCTTCATTGAGGACCATGATG GGTACACACCAACTGTCGAGGTTCGCAGGGGCCTCTACCGGATCCTGTTCCGCAAGAAGACGGCTGAGTGGCTCGGCGAGCTG GCTACTTCGAACGCCTCTACCAAGGTGGTCCCTGAGAGGGTCCTCAATGCGGGGGAGTCCGCTATCGCTGCCTTCCTCGCTGG CTACCTGGACGGCGATGGGTACCTCACTGAGTCTATTGTGGAGCTGGTCACCAAGTCACGGGAGCTCGCTGACGGGCTGGTGT TCCTCCTGAAGCGCCTGGGCATCACGCCGAGGATTAGCCAGAAGACAATCGAGGGGTCGGTCTACTACCGGATCTACATTACG GGCGAG GATCGCAAGACATTCGAGAAGGTCCTGGAGAAGTCCAGGATCAAGCCAGGGGAGATGAACGAGGGCGGGGTTGGCAG GTACCCACCAGCTCTGGGCAAGTTCCTCGGGAAGCTGTACAGCGAGTTCAGGCTCCCCAAGCGGGACAACGAGACTGCGTACC ACATCCTGACCAGGTCACGGAATGTGTGGTTCACCGAGAAGACGCTCTCCCGGATTGAGGAGTACTTCAGGGAGGCTCTGGAG AAGCTGTCGGAGGCTAGGAAGGCTCTGGAGATGGGCGACAAGCCGGAGCTGCCATTCCCTTGGACAGCGATCACTAAGTACGG GTTCACGGATCGCCAGGTCGCTAACTACAGGACAAGGGGCCTCCCAAAGAGGCCAGAGCTGAAGGAGAAGGTTGTGTCCGCCC TCCTGAAGGAGATCGAGAGGCTGGAGGGCGTGGCTAAGCTCGCTCTGGAGACCATTGAGCTCGCTAGGCGCCTGGAGTTCCAT GAGGTTTCCAGCGTGGAGGTCGTTGACTACAATGATTGGGTCTACGATCTCGTCATTCCAGAGACTCATAACTTCATTGCTCC AAATGGGCTCGTGCTCCACAACtccattgtggggacagccacgttcgatcagtactggagcgtgcgcacctctaagcggactt caggaacagtgaccgtgaccgatcacttccgcgcctgggcgaaccggggcctgaacctcggcacaatagaccaaattacattg tgcgtggagggttaccaaagctctggatcagccaacatcacccagaacaccttctctcagggctcttcttccggcagttcggg tggctcatccggctccacaacgactactcgcatcgagtgtgaAaacatgtccttgtccggaccctacgttagcaggatcacca atccctttaatggtattgcgctgtacgccGacggagacacagcccgcgctaccgttaacttccccgcaagtcgcaactacaa t ttccgcctgcggggttgcggcaacaacaataatcttgcccgtgtggacctgaggatcgacggacggaccgtcgggacctttta ttaccagggcacatacccctgggaggccccaattgacaatgtttatgtcagtgcggggagtcatacagtcgaaatcactgtta ctgcggataacggcacatgggacgtgtatgccgactacctggtgatacagtga (SEQ ID NO: 3330)
It is understood, therefore, that this invention is not limited to the particular embodiments disclosed, but is intended to cover all modifications that are within the spirit and scope of the invention as defined by the claims only; in the description above and / or shown in the attached drawings.
权利要求:
Claims (2)
[0001]
1. Modified intein protein, characterized by comprising an amino acid sequence selected from the group consisting of SEQ ID NOS: 2374, 2376 - 2378, 2383, 2385 - 2386, 2410, 2412 - 2413, 2420, 2422 - 2425, 2428 , 2430 - 2431, 2436, 2442 - 2444, 2451, 2454 - 2458, 2460 - 2461, 2465 - 2468, 2471 - 2474, 2477 - 2480, 2482 - 2483, 2493 - 2494, 2500 - 2501, 2504, 2507 - 2513 , 2517 - 2519, 2530 - 2531, 2533 - 2537, 2539 - 2543, 2545, 2548 - 2549, 2555 - 2557, 2559, 2565, 2569 - 2571, 2573 - 2575, 2579 - 2582, 2584 - 2597, 2600, 2602 - 2605, 2607 - 2621, 2624 - 2626, 2629 - 2634, 2636 - 2639, 2643, 2645 - 2648, 2650, 2652, 2656 - 2658, 2661, 2664, 2666 - 2672, 2674, 2677 - 2683 and 2685.
[0002]
2. Nucleic acid characterized by comprising a nucleotide sequence that encodes the modified intein protein comprising a nucleotide sequence selected from the group consisting of SEQ ID NOS: 2688, 2690 - 2692, 2697, 2699 - 2700, 2724, 2726 - 2727, 2734, 2736 - 2739, 2742, 2744 - 2745, 2750, 2756 - 2758, 2765, 2768 - 2772, 2774 - 2775, 2779 - 2782, 2785 - 2788, 2791 - 2794, 2796 - 2797, 2807 - 2808, 2814 - 2815, 2818, 2821 - 2827, 2831 - 2833, 2844 - 2845, 2847 - 2851, 2853 - 2857, 2859, 2862 - 2863, 2869 - 2871, 2873, 2879, 2883 - 2885, 2887 - 2889, 2893 - 2893 - 2889 2896, 2898 - 2911, 2914, 2916 - 2919, 2921 - 2935, 2938 - 2940, 2943 - 2948, 2950 - 2953, 2957, 2959 - 2962, 2964, 2966, 2970 - 2972, 2975, 2978, 2980 - 2986, 2988, 2991 - 2997 and 2999.
类似技术:
公开号 | 公开日 | 专利标题
BR112012010744B1|2020-09-29|MODIFIED INTEIN PROTEIN AND NUCLEIC ACID
US8465958B2|2013-06-18|Polypeptides having endoglucanase activity and polynucleotides encoding same
US11098290B2|2021-08-24|Polypeptides having cellulolytic enhancing activity and polynucleotides encoding same
NZ586014A|2012-07-27|Polypeptides having cellulolytic enhancing activity and polynucleotides encoding same
US9982285B2|2018-05-29|Polypeptides having cellulolytic enhancing activity and polynucleotides encoding same
WO2014155566A1|2014-10-02|Thermostable cellobiohydrolase
Perret et al.2004|Use of antisense RNA to modify the composition of cellulosomes produced by Clostridium cellulolyticum
US10793845B2|2020-10-06|Polypeptides having cellulolytic enhancing activity and polynucleotides encoding same
US20090148903A1|2009-06-11|Polypeptides having beta-glucosidase activity and polynucleotides encoding same
US9464333B2|2016-10-11|Intein-modified enzymes, their production and industrial applications
US10407742B2|2019-09-10|Intein-modified enzymes, their production and industrial applications
US20100333223A1|2010-12-30|Carbohydrate binding plant hydrolases which alter plant cell walls
US8581042B2|2013-11-12|Polypeptides having beta-glucosidase activity and polynucleotides encoding same
US20150337347A1|2015-11-26|Polypeptides Having Cellulolytic Enhancing Activity And Polynucleotides Encoding Same
US20150232825A1|2015-08-20|Polypeptides Having Cellulolytic Enhancing Activity And Polynucleotides Encoding Same
US9890370B2|2018-02-13|Hyperthermostable endoglucanase
Willis2016|Modification of carbohydrate active enzymes in switchgrass | to improve saccharification and biomass yields for biofuels
同族专利:
公开号 | 公开日
CN102712681B|2016-07-06|
US10196623B2|2019-02-05|
CN102712682B|2016-07-06|
WO2011057101A4|2011-06-30|
WO2011057163A2|2011-05-12|
US20160186157A1|2016-06-30|
BR112012010744A2|2017-09-19|
WO2011057101A1|2011-05-12|
CN102712681A|2012-10-03|
US8420387B2|2013-04-16|
WO2011057163A3|2011-06-30|
CN102712682A|2012-10-03|
US20110111442A1|2011-05-12|
US9303250B2|2016-04-05|
UA115022C2|2017-09-11|
US20130247251A1|2013-09-19|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

AT218797T|1988-09-06|2002-06-15|Univ Washington|ORAL IMMUNIZATION BY USING TRANSGENIC PLANTS|
JP3209744B2|1990-01-22|2001-09-17|デカルブ・ジェネティクス・コーポレーション|Transgenic corn with fruiting ability|
US6946587B1|1990-01-22|2005-09-20|Dekalb Genetics Corporation|Method for preparing fertile transgenic corn plants|
US6022846A|1990-09-21|2000-02-08|Mogen International And Gist-Brocades N.V.|Expression of phytase in plants|
US6395966B1|1990-08-09|2002-05-28|Dekalb Genetics Corp.|Fertile transgenic maize plants containing a gene encoding the pat protein|
US5834247A|1992-12-09|1998-11-10|New England Biolabs, Inc.|Modified proteins comprising controllable intervening protein sequences or their elements methods of producing same and methods for purification of a target protein comprised by a modified protein|
DE69333667T2|1992-12-09|2006-02-02|New England Biolabs, Inc., Beverly|Modified proteins containing controllable intermediate sequences and methods for their preparation|
US5496714A|1992-12-09|1996-03-05|New England Biolabs, Inc.|Modification of protein by use of a controllable interveining protein sequence|
EP1642980A1|1995-06-28|2006-04-05|New England Biolabs, Inc.|Modified proteins and methods of their production|
AU2523597A|1996-03-29|1997-10-22|Pacific Enzymes Limited|A xylanase|
US5981835A|1996-10-17|1999-11-09|Wisconsin Alumni Research Foundation|Transgenic plants as an alternative source of lignocellulosic-degrading enzymes|
WO1998021348A1|1996-11-12|1998-05-22|Battelle Memorial Institute|Method of producing human growth factors from whole plants or plant cell cultures|
US6800792B1|1998-10-05|2004-10-05|Prodigene Inc.|Commercial production of laccase in plants|
AT318894T|1998-12-18|2006-03-15|Penn State Res Found|INTEIN-MEDIATED CYCLISATION OF PEPTIDES|
CA2364997A1|1999-03-05|2000-09-08|Maxygen, Inc.|Encryption of traits using split gene sequences|
US6531316B1|1999-03-05|2003-03-11|Maxyag, Inc.|Encryption of traits using split gene sequences and engineered genetic elements|
US20040096938A1|1999-05-24|2004-05-20|Ming-Qun Xu|Method for generating split, non-transferable genes that are able to express an active protein product|
US6858775B1|1999-05-24|2005-02-22|New England Biolabs, Inc.|Method for generating split, non-transferable genes that are able to express an active protein product|
JP2003505012A|1999-05-24|2003-02-12|ニュー・イングランド・バイオラブズ・インコーポレイティッド|Method for producing a disrupted non-transmissible gene capable of expressing an active protein product|
US6521435B1|1999-08-30|2003-02-18|The United States Of America As Represented By The Secretary Of Agriculture|Nucleic acid sequences encoding cell wall-degrading enzymes and use to engineer resistance to Fusarium and other pathogens|
DE60140996D1|2000-02-11|2010-02-25|Metabolix Inc|INTEIN CONTAINING MULTIGEN EXPRESSION CONSTRUCTS|
US20070192900A1|2006-02-14|2007-08-16|Board Of Trustees Of Michigan State University|Production of beta-glucosidase, hemicellulase and ligninase in E1 and FLC-cellulase-transgenic plants|
US7049485B2|2000-10-20|2006-05-23|Board Of Trustees Of Michigan State University|Transgenic plants containing ligninase and cellulase which degrade lignin and cellulose to fermentable sugars|
JP2005503153A|2001-08-27|2005-02-03|シンジェンタパーティシペーションズアクチェンゲゼルシャフト|Self-processing plants and plant parts|
US20030159182A1|2001-08-29|2003-08-21|Eilleen Tackaberry|Production of therapeutic proteins in transgenic cereal crops|
WO2003050265A2|2001-12-10|2003-06-19|Diversa Corporation|Compositions and methods for normalizing assays|
BR0307155A|2002-01-08|2007-06-19|Michael R Raab|transgenic plants expressing proteins modified by civps or integers and relative method|
US20030167533A1|2002-02-04|2003-09-04|Yadav Narendra S.|Intein-mediated protein splicing|
US7314974B2|2002-02-21|2008-01-01|Monsanto Technology, Llc|Expression of microbial proteins in plants for production of plants with improved properties|
WO2005024044A2|2003-09-05|2005-03-17|Gtc Biotherapeutics, Inc.|Method for the production of fusion proteins in transgenic mammal milk|
US20080289066A1|2004-03-08|2008-11-20|Lanahan Michael B|Self-Processing Plants and Plant Parts|
CA2567272A1|2004-05-19|2005-12-01|Agrivida, Inc.|Transgenic plants expressing intein modified proteins and associated processes for bio-pharmaceutical production|
CA2638801C|2006-02-14|2016-12-13|Verenium Corporation|Xylanases, nucleic acids encoding them and methods for making and using them|
WO2007146944A2|2006-06-16|2007-12-21|Syngenta Participations Ag|Catalytically inactive proteins and method for recovery of enzymes from plant-derived materials|
US8420387B2|2009-11-06|2013-04-16|Agrivida, Inc.|Intein-modified enzymes, their production and industrial applications|US8420387B2|2009-11-06|2013-04-16|Agrivida, Inc.|Intein-modified enzymes, their production and industrial applications|
US9464333B2|2009-11-06|2016-10-11|Agrivida, Inc.|Intein-modified enzymes, their production and industrial applications|
US10407742B2|2009-11-06|2019-09-10|Agrivida, Inc.|Intein-modified enzymes, their production and industrial applications|
ES2774426T3|2012-11-14|2020-07-21|Agrivida Inc|Methods and compositions for processing biomass with high levels of starch|
US9598700B2|2010-06-25|2017-03-21|Agrivida, Inc.|Methods and compositions for processing biomass with elevated levels of starch|
US10443068B2|2010-06-25|2019-10-15|Agrivida, Inc.|Plants with engineered endogenous genes|
UA109141C2|2010-06-25|2015-07-27|TRANSGENIC PLANT WITH INCREASED LEVEL OF PLANT STARCH|
ES2579054T3|2010-08-27|2016-08-04|Agrivida, Inc.|Development of a cellulosic processing characteristic using a xylanase modified by a thermoregulated intein|
CN103547659B|2011-03-07|2016-09-28|谷万达公司|The combined pretreatment of the plant biomass of express cell wall degrading enzyme and hydrolysis|
US9725483B2|2012-04-16|2017-08-08|Lipotec, S.A.|Compounds for the treatment and/or care of the skin and/or mucous membranes and their use in cosmetic or pharmaceutical compositions|
WO2013181271A2|2012-05-29|2013-12-05|Agrivida, Inc.|Strong constitutive promoters for heterologous expression of proteins in plants|
WO2014004336A2|2012-06-27|2014-01-03|The Trustees Of Princeton University|Split inteins, conjugates and uses thereof|
CA2885931A1|2012-10-03|2014-04-10|Agrivida, Inc.|Intein-modified proteases, their production and industrial applications|
BR102014025574A2|2013-10-15|2015-09-29|Dow Agrosciences Llc|zea mays regulatory elements and uses thereof|
BR102014025499A2|2013-10-15|2015-09-29|Dow Agrosciences Llc|zea mays regulatory elements and their use|
EP2883953A1|2013-12-12|2015-06-17|Westfälische Wilhelms-Universität Münster|An atypical naturally split intein engineered for highly efficient protein modification|
WO2016174311A1|2015-04-30|2016-11-03|University Of Helsinki|Ion-inducible protein modification|
BR112019000635A2|2015-07-13|2019-04-30|Universite Laval|toll-like receptor 2 inhibitorand its use, antibody, composition and method for treating an inflammatory condition|
BR112018005287A2|2015-09-18|2018-12-11|Agrivida Inc|modified phytases and methods of using them|
WO2018206535A1|2017-05-08|2018-11-15|Novozymes A/S|Carbohydrate-binding domain and polynucleotides encoding the same|
法律状态:
2018-04-10| B06F| Objections, documents and/or translations needed after an examination request according art. 34 industrial property law|
2019-05-14| B06T| Formal requirements before examination|
2019-08-27| B06A| Notification to applicant to reply to the report for non-patentability or inadequacy of the application according art. 36 industrial patent law|
2020-03-03| B06A| Notification to applicant to reply to the report for non-patentability or inadequacy of the application according art. 36 industrial patent law|
2020-07-28| B09A| Decision: intention to grant|
2020-09-29| B16A| Patent or certificate of addition of invention granted|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 05/11/2010, OBSERVADAS AS CONDICOES LEGAIS. |
优先权:
申请号 | 申请日 | 专利标题
US12/590,444|2009-11-06|
US12/590,444|US8420387B2|2009-11-06|2009-11-06|Intein-modified enzymes, their production and industrial applications|
PCT/US2010/055751|WO2011057163A2|2009-11-06|2010-11-05|Intein-modified enzymes, their production and industrial applications|
[返回顶部]